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Abstract 



o 

(N 

In this paper, a new technique for the optimization of (partially) bound queries 
, over disjunctive Datalog programs with stratified negation is presented. The tech- 

' nique exploits the propagation of query bindings and extends the Magic Set opti- 

mization technique (originally defined for non-disjunctive programs). 

An important feature of disjunctive Datalog programs is nonmonotonicity, which 
■ calls for nondeterministic implementations, such as backtracking search. A distin- 

guishing characteristic of the new method is that the optimization can be exploited 
O \ also during the nondeterministic phase. In particular, after some assumptions have 

been made during the computation, parts of the program may become irrelevant 
to a query under these assumptions. This allows for dynamic pruning of the search 
space. In contrast, the effect of the previously defined Magic Set methods for dis- 



Tjij- ' junctive Datalog is limited to the deterministic portion of the process. In this way, 

. the potential performance gain by using the proposed method can be exponential, 

• \ as could be observed empirically. 

. The correctness of the method is established and proved in a formal way thanks 

(N ■ to a strong relationship between Magic Sets and unfounded sets that has not been 

studied in the literature before. This knowledge allows for extending the method 
and the correctness proof also to programs with stratified negation in a natural way. 

^ ' The proposed method has been implemented in the DLV system and various 

d . experiments on synthetic as well as on real-world data have been conducted. The 

experimental results on synthetic data confirm the utility of Magic Sets for dis- 
junctive Datalog, and they highlight the computational gain that may be obtained 
by the new method with respect to the previously proposed Magic Set method for 
disjunctive Datalog programs. Further experiments on data taken from a real-life ap- 
plication show the benefits of the Magic Set method within an application scenario 
that has received considerable attention in recent years, the problem of answer- 
ing user queries over possibly inconsistent databases originating from integration of 
autonomous sources of information. 
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1 Introduction 



Disjunctive Datalog is a language that has been proposed for modehng in- 
complete data [48]. Together with a light version of negation, in this paper 
stratified negation, this language can in fact express any query of the com- 
plexity class T12 (i.e., NP^"^) [22], under the stable model semantics. It turns 
out that disjunctive Datalog with stratified negation is strictly more expres- 
sive (unless the polynomial hierarchy collapses to its first level) than normal 
logic programming (i.e., non-disjunctive Datalog with unstratified negation), 
as the latter can express "only" queries in NP. As shown in [22], the high 
expressive power of disjunctive Datalog has also some positive practical im- 
plications in terms of modelling knowledge, since many problems in NP can 
be represented more simply and naturally in stratified disjunctive Datalog 
than in normal logic programming. For this reason, it is not surprising that 
disjunctive Datalog has found several real- world applications [42,49,50,57,58], 
also encouraged by the availability of some efficient inference engines, such as 
DLV [43], GnT [37], Cmodels [46], or ClaspD [21]. As a matter of fact, these 
systems are continuously enhanced to support novel optimization strategies, 
enabling them to be effective over increasingly larger application domains. 
In this paper, we contribute to this development by providing a novel opti- 
mization technique, inspired by deductive database optimization techniques, 
in particular the Magic Set method [6,9,63]. 

The goal of the original Magic Set method (defined for non-disjunctive Datalog 

programs) is to exploit the presence of constants in a query for restricting the 
possible search space by considering only a subset of a hypothetical program 
instantiation that is sufficient to answer the query in question. In order to 
do this, a top-down computation for answering the query is simulated in an 
abstract way. This top-down simulation is then encoded by means of rules, 
defining new Magic Set predicates. The extensions of these predicates (sets of 
ground atoms) will contain the tuples that are calculated during a top-down 
computation. These predicates are inserted into the original program rules 
and can then be used by bottom-up computations to narrow the computation 
to what is needed for answering the query. 

Extending these ideas to disjunctive Datalog faces a major challenge: While 
non-disjunctive Datalog programs are deterministic, which in terms of the 
stable model semantics means that any non-disjunctive Datalog program has 
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exactly one stable model, disjunctive Datalog programs are nondeterministic 
in the sense that they may have multiple stable models. Of course, the main 
goal is still isolating a subset of a hypothetical program instantiation, upon 
which the considered query will be evaluated in an equivalent way. There 
are two basic possibilities how this nondeterminism can be dealt with in the 
context of Magic Sets: The first is to consider static Magic Sets, in the sense 
that the definition of the Magic Sets is still deterministic, and therefore the 
extension of the Magic Set predicates is equal in each stable model. This static 
behavior is automatic for Magic Sets of non-disjunctive Datalog programs. The 
second possibility is to allow dynamic Magic Sets, which also introduce non- 
deterministic definitions of Magic Sets. This means that the extension of the 
Magic Set predicates may differ in various stable models, and thus can be 
viewed as being specialized for each stable model. 

While the nature of dynamic Magic Sets intuitively seems to be more fitting 
for disjunctive Datalog than static Magic Sets, considering the architecture of 
modern reasoning systems for disjunctive Datalog substantiates this intuition: 
These systems work in two phases, which may be considered as a determin- 
istic (grounding) and a non-deterministic (model search) part. The interface 
between these two is by means of a ground program, which is produced by the 
deterministic phase. Static Magic Sets will almost exclusively have an impact 
on the grounding phase, while dynamic Magic Sets also have the possibility 
to infiuence the model search phase. In particular, some assumptions made 
during the model search may render parts of the program irrelevant to the 
query, which may be captured by dynamic Magic Sets, but not (or only under 
very specific circumstances) by static Magic Sets. 

In the literature, apart from our own work in [20], there is only one previous 
attempt for defining a Magic Set method for disjunctive Datalog, reported 
in [32,33], which will be referred to as Static Magic Sets (SMS) in this work. 
The basic idea of SMS is that bindings need to be propagated not only from 
rule heads to rule bodies (as in traditional Magic Sets), but also from one 
head predicate to other head predicates. In addition to producing definitions 
for the predicates defining Magic Sets, the method also introduces additional 
auxiliary predicates called collecting predicates. These collecting predicates 
however have a peculiar effect: Their use keeps the Magic Sets static. Indeed, 
both magic and collecting predicates are guaranteed to have deterministic def- 
initions, which implies that disjunctive Datalog systems can exploit the Magic 
Sets only during the grounding phase. Most systems will actually produce a 
ground program which does contain neither magic nor collecting predicates. 

In this article, we propose a dynamic Magic Set method for disjunctive Datalog 
with stratified negation under the stable model semantics, provide an imple- 
mentation of it in the system DLV, and report on an extensive experimental 
evaluation. In more detail, the contributions are: 
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► We present a dynamic Magic Set method for disjunctive Datalog programs 
with stratified negation, referred to as Dynamic Magic Sets (DMS). Different 
from the previously proposed static method SMS, existing systems can ex- 
ploit the information provided by the Magic Sets also during their nondeter- 
ministic model search phase. This feature allows for potentially exponential 
performance gains with respect to the previously proposed static method. 

► We formally establish the correctness of DMS. In particular, we prove that 
the program obtained by the transformation DMS is query-equivalent to the 
original program. This result holds for both brave and cautious reasoning. 

► We highhght a strong relationship between Magic Sets and unfounded sets, 
which characterize stable models. We can show that the atoms which are 
relevant for answering a query are either true or form an unfounded set, 
which eventually allows us to prove the query-equivalence results. 

► Our results hold for a disjunctive Datalog language with stratified negation 
under the stable model semantics. In the literature, several works deal with 
non-disjunctive Datalog with stratified negation under the well-founded or 
the perfect model semantics, which are special cases of our language. For 
the static method SMS, an extension to disjunctive Datalog with stratified 
negation has previously only been sketched in [33]. 

► We have implemented a DMS optimization module inside the DLV system 
[43]. In this way, we could exploit the internal data-structures of the DLV 
system and embed DMS in the core of DLV. As a result, the technique is 
completely transparent to the end user. The system is available at http: 
//www. dlvsystem. com/magic/. 

► We have conducted extensive experiments on synthetic domains that high- 
light the potential of DMS. We have compared the performance of the DLV 
system without Magic Set optimization with SMS and with DMS. The results 
show that in many cases the Magic Set methods yield a significant perfor- 
mance benefit. Moreover, we can show that the dynamic method DMS can 
yield drastically better performance than the static SMS. Importantly, in 
cases in which DMS cannot be beneficial (if all or most of the instantiated 
program is relevant for answering a query), the overhead incurred is very 
hght. 

► We also report on experiments which evaluate the impact of DMS on an 
industrial application scenario on real-world data. The application involves 
data integration and builds on several results in the literature (for example 
[5,7,14,16,17,31]), which transform the problem of query answering over in- 
consistent databases (in this context stemming from integrating autonomous 
data sources) into query answering over disjunctive Datalog programs. By 
leveraging these results, DMS can be viewed as a query optimization method 
for inconsistent databases or for data integration systems. The results show 
that DMS can yield significant performance gains for queries of this applica- 
tion. 
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Organization. The main body of this article is organized as follows. In 
Section 2, preliminaries on disjunctive Datalog and on the Magic Set method 
for non-disjunctive Datalog queries are introduced. Subsequently, in Section 3 
the extension DMS for the case of disjunctive Datalog programs is presented, 
and we show its correctness. In Section 4 we discuss the implementation and 
integration of the Magic Set method within the DLV system. Experimental re- 
sults on synthetic benchmarks are reported in Section 5, while the application 
to data integration and its experimental evaluation is discussed in Section 6. 
Finally, related work is discussed in Section 7, and in Section 8 we draw our 
conclusions. 



2 Preliminctries 

In this section, (disjunctive) Datalog programs with (stratified) negation are 

briefly described, and the standard Magic Set method is presented together 
with the notion of sideways information passing strategy (SIPS) for Datalog 
rules. 

2.1 Disjunctive Datalog Programs with Stratified Negation 

In this paper, we adopt the standard Datalog name convention: Alphanumeric 

strings starting with a lowercase character are predicate or constant symbols, 
while alphanumeric strings starting with an uppercase character arc variable 
symbols; moreover, we allow the use of positive integer constant symbols. 
Each predicate symbol is associated with a non-negative integer, referred to 
as its arity. An atom p{t) is composed of a predicate symbol p and a list t— 
ti, . . . ,tk {k > 0) of terms, each of which is either a constant or a variable. A 
literal is an atom p{t) or a negated atom not p{i)] in the first case the literal 
is positive, while in the second it is negative. 

A disjunctive Datalog rule with negation (short: Datalog^'" rule) r is of the 
form 

Pi{ti) V ••• V pn(tn) :- gi(si), QjiSj), 

not qj+i{sj+i), . . . , not qm{sm)- 

where pi(ti), . . . , p„(^„), gi(si), . . . ,j7m(sm) arc atoms and n > 1, m > j > 0. 
The disjunction piiti) V • ■ ■ V Pnitn) is the head of r, while the conjunction 
gi(si), . . . , qj{sj), not qj+i{sj+i), . . . , not qm{sm) is the body of r. Moreover, 
H{r) denotes the set of head atoms, while B{r) denotes the set of body literals. 
We also use -B+(r) and B~{r) for denoting the sets of atoms appearing in 
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positive and negative body literals, respectively. If r is disjunction-free, that 
is n = 1, and negation-free, that is B^{r) is empty, then we say that r is a 
Datalog rule; if _B+(r) is empty in addition, then we say that r is a fact. A 
disjunctive Datalog program V is a. finite set of rules; if all the rules in it are 
disjunction- and negation-free, then is a (standard) Datalog program. 

Given a Datalog^'" program V. a predicate belongs to the Intensional Database 
(IDB) if it is either in the head of a rule with non-empty body, or in the head 
of a disjunctive rule; otherwise, it belongs to the Extensional Database (EDB). 
The set of rules having IDB predicates in their heads is denoted by IDB{V), 
while EDB(V) denotes the remaining rules, that is, EDB(V) = V\IDB(V). 
For simplicity, we assume that predicates will always be of the same type 
(EDB or IDB) in any program. 

The set of all constants appearing in a program V is the universe of V and is 
denoted by U-p,^ while the set of ground atoms constructable from predicates 
in V with constants in U-p is the base of V, denoted by B-p. We call an atom 
(rule, or program) ground if it does not contain any variables. A substitution 
■J? is a function from variables to elements of U-p. For an expression S (atom, 
literal, rule), by S-d we denote the expression obtained from S by substituting 
all occurrences of each variable X in 5" with 'diX). A ground atom p{t) (resp. 
ground rule Vg) is an instance of an atom p{i') (resp. rule r) if there is a 
substitution i? from the variables in p{t') (resp. in r) to Up> such that p{t) — 
p{i')'d (resp. Vg — r-d). Given a program V, Ground{V) denotes the set of all 
possible instances of rules in V. 

Given an atom pit) and a set of ground atoms A, by we denote the set 

of ground instances of pit) belonging to A. For example, -Bp|p(f) is the set of 
all ground atoms obtained by applying to p{t) all the possible substitutions 
from the variables in pit) to Up^ that is, the set of all the instances of pit). 
Abusing notation, if i? is a set of atoms, by A\b we denote the union of all 
for each pit) G B. 

A desirable property of Datalog^'" programs is safety. A Datalog^'"' rule r is 
safe if each variable appearing in r appears in at least one atom of B^{r). A 
Datalog^'" program is safe if all its rules are safe. Moreover, programs without 
recursion over negated hterals constitute an interesting class of Datalog^'" 
programs. Without going into details, a predicate p in the head of a rule r 
depends on all the predicates q in the body of r; p depends on q positively if 
q appears in B^{r), and p depends on q negatively if q appears in B~{r). A 
program has recursion over negation if a cycle of dependencies with at least one 
negative dependency exists. If a program has no recursion over negation, then 
the program is stratified (short: Datalog^'"'^). In this work only safe programs 

^ If ■p has no constants, an arbitrary constant is added to Up. 
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without recursion over negation are considered. 

An interpretation for a program P is a subset / of B-p. A positive ground 
literal p{t) is true with respect to an interpretation / if p{t) e /; otherwise, it 
is false. A negative ground literal not pit) is true with respect to / if and only 
if pit) is false with respect to /, that is, if and only if pit) ^ /. The body of a 
ground rule r is true with respect to / if and only if all the body literals of r 
are true with respect to /, that is, if and only if B^{r) C / and B~{r) fl / = 0. 
An interpretation / satisfies a ground rule r e Ground{V) if at least one atom 
in H{r) is true with respect to / whenever the body of r is true with respect 
to /. An interpretation / is a model of a Datalog^'" program "P if / satisfies 
all the rules in Ground{V). Since an interpretation is a set of atoms, if / is an 
interpretation for a program V, and V' is another program, then by I\b^i we 
denote the restriction of / to the base of V' . 

Given an interpretation / for a program P, the reduct of V with respect to 
/, denoted by Ground{V)^ , is obtained by deleting from GroundiV) all the 
rules Tg with B~ir-g) fl / 7^ 0, and then by removing all the negative literals 
from the remaining rules. 

The semantics of a Datalog^'" program V is given by the set SAiiV) of 
stable models of P, where an interpretation M is a stable model for V if and 
only if M is a subset- minimal model of Ground{V)^ . It is well-known that 
there is exactly one stable model for any Datalog program, also in presence 
of stratified negation. However, for a Datalog^'"' program |iSA^(P)| > 1 
holds (Datalog^'" programs, instead, can also have no stable model). 

Given a ground atom pit) and a Datalog^'" program P, is a cautious 
(or certain) consequence of P, denoted by V |=c pit)^ if pif) G M for each 
M G SM.{V)\ pit) is a brave (or possible) consequence of denoted by 
V \=b p{t), ii p{t) G M for some M G SAi{V). Note that brave and cautious 
consequences coincide for Datalog programs, as these programs have a unique 
stable model. Moreover, cautious consequences of a Datalog^'"''' program V 
are also brave consequences of V because \SM.{V)\ > 1 holds in this case. 

Given a query Q = g{i)l (an atom),^ AnSc{Q,V) denotes the set of all 
substitutions for the variables of g{t) such that V |=c 5'(i)'*^, while Ansb{Q, V) 
denotes the set of substitutions § for the variables of git) such that V |=6 g{t)'d. 

Let V and V be two Datalog^'" programs and Q a query. Then V and V' 

^ Note that more complex queries can still be expressed using appropriate rules. 
We assume that each constant appearing in Q also appears in V; if this is not the 
case, then we can add to P a fact pit) such that p is a predicate not occurring 
in V and t are the arguments of Q. Question marks will be usually omitted when 
referring to queries in the text. 
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are brave- equivalent with respect to Q, denoted by V=^qV' . if Ansi){Q,V U 
T) = Ansiy{Q,V' U J-") is guaranteed for each set of facts J-" defined over 
predicates which are EDB predicates of V or V'; similarly, V and V are 
cautious- equivalent with respect to Q, denoted by V^qP', if AnSc{Q, VUJ-') ~ 
AnSc{Q,V' \J T) is guaranteed for each set of facts T defined over predicates 
which are EDB predicates of V or V'. 

2.2 Bottom-up Disjunctive Datalog Computation 

Many Datalog^'" systems implement a two-phase computation. The first 
phase, referred to as program instantiation or grounding, is bottom-up. For 
an input program V, it produces a ground program which is equivalent to 
Ground{V) , but significantly smaller. Most of the techniques used in this phase 
stem from bottom-up methods developed for classic and deductive databases; 
see for example [1] or [28,43] for details. Essentially, predicate instances which 
are known to be true or known to be false are identified and this knowledge 
is used for deriving further instances of this kind. Eventually, the truth values 
obtained in this way are used to produce rule instances which are not satisfied 
already. It is important to note that this phase behaves in a deterministic 
way with respect to stable models. No assumptions about truth or falsity of 
atoms are made, only definite knowledge is derived, which must hold in all 
stable models. For this reason, programs with multiple stable models cannot 
be solved by grounding. 

The second phase is often referred to as stable model search and takes care of 
the non-deterministic computation. Essentially, one undefined atom is selected 
and its truth or falsity is assumed. The assumption might imply truth or 
falsity of other undefined atoms. Hence, the process is repeated until cither an 
inconsistency is derived or all atoms have been interpreted. In the latter case an 
additional check is performed to ensure stability of the model. Details on this 
process can be found for example in [23] . Query answering is typically handled 
by storing all admissible answer substitutions as stable models arc computed. 
For brave reasoning, each stable model can contribute substitutions to the 
set of answers. In this case the set of answers is initially empty. For cautious 
reasoning, instead, each stable model may eliminate some substitutions from 
the set of admissible answers. Therefore, in this case all possible substitutions 
for the input query are initially contained in the set of answers. 

2.3 Sideways Information Passing for Datalog Rules 

The Magic Set method aims at simulate a top-down evaluation of a query 
Q, like for instance the one adopted by Prolog. According to this kind of 
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evaluation, all the rules r such that p{t) G H{r) and H(r){l = {Qi}'} (for 
some substitution i} for all the variables of r and some substitution i)' for all 
the variables of Q) are considered in a first step. Then the atoms in B^{r-){) 
are taken as subqueries (we recall that standard Datalog rules have empty 
negative body), and the procedure is iterated. Note that, according to this 
process, if a (sub) query has some argument that is hound to a constant value, 
this information is "passed" to the atoms in the body. Moreover, the body is 
considered to be processed in a certain sequence, and processing a body atom 
may bind some of its arguments for subsequently considered body atoms, thus 
"generating" and "passing" bindings within the body. Whenever a body atom 
is processed, each of its argument is therefore considered to be either hound 
or free. We illustrate this mechanism by means of an example. 

Example 2.1 Let path(l, 5) be a query for a program having the following 
inference rules: 

n : path(X,Y) :- edge(X,Y). 

r2 : path(X,Y) :- edge(X,Z), path(Z,Y). 

Since this is a Datalog program, brave and cautious consequences coincide. 
Moreover, let Fi = {edge(l, 3), edge(2, 4), edge(3, 5)} be the EDB of the 

program. A top-down evaluation scheme considers ri and r2 with X and Y 
bound to 1 and 5, respectively. In particular, when considering ri, the infor- 
mation about the binding of the two variables is passed to edge(X, Y), which 
is indeed the only query atom occurring in r^. Thus, the evaluation fails since 
edge(l, 5) does not occur in J^i. 

When considering r2, instead, the binding information can be passed either 
to path(Z,Y) or to edge(X, Z). Suppose that atoms are evaluated according 
to their ordering in the rule (from left to right); then edge(X, Z) is consid- 
ered before path(Z,Y). In particular, J^i contains the atom edge(l,3), which 
leads us to map Z to 3. Eventually, this inferred binding information might 
be propagated to the remaining body atom path(Z,Y), which hence becomes 
path(3, 5). 

The process has now to be repeated by looking for an answer to path(3, 5). 
Again, rule ri can be considered, from which we conclude that this query is 
true since edge(3, 5) occurs in J^i. Thus, path(l, 5) holds as well due to r2- □ 

Note that in the example above we have two degrees of freedom in the spec- 
ification of the top-down evaluation scheme. The first one concerns which 
ordering is used for processing the body atoms. While Prolog systems are usu- 
ally required to follow the ordering in which the program is written, Datalog 
has a purely declarative semantics which is independent of the body order- 
ing, allowing for an arbitrary ordering to be adopted. The second degree of 
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freedom is slightly more sTibtlc, and concerns the selection of the terms to be 
considered bound to constants from previous evaluations. Indeed, while we 
have considered the propagation of all the binding information that originates 
from previously processed body atoms, it is in general possible to restrict 
the top-down evaluation to partially propagate this information. For instance, 
one may desire to propagate only information generated from the evaluation 
of EDB predicates, or even just the information that is passed on via the head 
atom. 

The specific propagation strategy adopted in the top-down evaluation scheme 
is called sideways information passing strategy (SIPS), which is just a way of 
formalizing a partial ordering over the atoms of each rule together with the 
specification of how the bindings originated and propagate [9,33]. To formalize 
this concept, in what follows, for each IDB atom we shall denote its 
associated binding information (originated in a certain step of the top-down 
evaluation) by means of a string a built over the letters b and /, denoting 
"bound" and "free", respectively, for each argument of p(f). 

Definition 2.2 (SIPS for Datalog rules) A SIPS for a Datalog rule r with 
respect to a binding a for the atom p{t) e H{r) is a pair (^", /"), where: 

(1) is a strict partial order over the atoms in H{r) U B^{r), such that 
pit) (l{s), for all atoms q{s) e B'^{r); and, 

(2) is a function assigning to each atom q{s) G H{r) U 5+(r) a subset of 
the variables in s — intuitively, those made bound when processing q{s). 

Intuitively, for each atom q{s) occurring in r, the strict partial order spec- 
ifies those atoms that have to be processed before processing atom q{s). Even- 
tually, an argument X of q{s) is bound to a constant if there exists an atom 
q'{s') such that g'(s') q{s) and X e fril'i^'))- Note that the head atom 
p{t) precedes all other atoms in 

Example 2.3 The SIPS we have adopted in Example 2.1 for ri with respect 
to the binding bb (originating from the query path(l, 5)) can be formalized as 
the pair (^^^/,\^), where path(X,Y) -<^f edge(X,Y), /,\^(path(X, Y)) = {X,Y}, 
and /^j^(edge(X, Y)) = 0. Instead, the SIPS we have adopted for r2 with respect 
to the binding bb can be formahzed as the pair (^^2' /r^)' "where path(X, Y) -<^^ 
edge(X, Z) path(Z, Y), /,^^^(path(X, Y)) = {X, Y}, /,^^(edge(X, Z)) = {Z}, and 
/,^^^(path(Z,Y)) = 0. □ 

All the algorithms and techniques we shall develop in this paper are orthogonal 
with respect to the underlying SIPSes to be used in the top-down evaluation. 
Thus, in Section 2.4, we shall assume that Datalog programs are provided in 
input together with some arbitrarily defined SIPS (-<",/"), for each rule r 
and for each possible adornment a for the head atom in H{r). 
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2-4 Magic Sets for Datalog Programs 



The Magic Set method is a strategy for simulating the top-down evaluation 
of a query by modifying the original program by means of additional rules, 
which narrow the computation to what is relevant for answering the query. 
We next provide a brief and informal description of the Magic Set rewriting 
technique. The reader is referred to [63] for a detailed presentation. 

The method is structured in four main phases, which are informally illustrated 
below by means of Example 2.1. 

(1) Adornment. The key idea is to materialize the binding information for 

IDB predicates that would be propagated during a top-down computation. In 
particular, the fact that an IDB predicate p(t) is associated with a binding 
information a (i.e., a string over the letters b and /, one for each term in 
t) is denoted by the atom obtained adorning the predicate symbol with the 
binding at hand, that is, by In what follows, the predicate is said to 

be an adorned predicate. 

First, adornments are created for query predicates so that an argument oc- 
curring in the query is adorned with the letter b if it is a constant, or with the 
letter / if it is a variable. For instance, the adorned version of the query atom 
path(l, 5) is path^''(l, 5), which gives rise to the adorned predicate path^''. 

Each adorned predicate is eventually used to propagate its information into 
the body of the rules defining it according to a SIPS, thereby simulating 
a top-down evaluation. In particular, assume that the binding a. has to be 
propagated into a rule r whose head is p{t). Thus, the associated SIPS {-<", /") 
determines which variables will be bound in the evaluation of the various body 
atoms. Indeed, a variable X of an atom q{s) in r is bound if and only if either 

(1) X e with qis) = Pit); or, 

(2) X e fr{b{^)) for an atom b{z) e B+{r) such that b{z) q{s) holds. 

Adorning a rule r with respect to an adorned predicate means propagating 
the binding information a, starting from the head predicate pit) G H{r), 
thereby creating a novel adorned rule where all the IDB predicates in r are 
substituted by the adorned predicates originating from the binding according 
to (1) and (2). 

Example 2.4 Adorning the query path(l,5) generates path^^(l,5). Then, 
propagating the binding information bb into the rule ri, i.e., when adorning ri 
with path^*', produces the following adorned rule (recall here that adornments 
apply only to IDB predicates, whereas edge is an EDB predicate): 

: path*'^(X,Y) :- edge(X,Y). 
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Instead, when propagating bb into the rule r2 according to the SIPS (^^2 ' fr^) 
defined in Example 2.3, we obtain the following adorned rule: 

rf : path^^(X,Y) :- edge(X,Z), path^*'(Z,Y). ^ 

While adorning rules, novel binding information in the form of yet unseen 
adorned predicates may be generated, which should be used for adorning 
other rules. In fact, the adornment step is repeated until all bindings have 
been processed, yielding the adorned program, which is the set of all adorned 
rules created during the computation. For instance, in the above example, the 
adorned program just consists of r" and r2 for no adorned predicate different 
from path^^ is generated. 

(2) Generation. In the second step of the Magic Set method, the adorned 
program is used to generate magic rules, which arc used to simulate the top- 
down evaluation scheme and to single out the atoms relevant for answer the 
input query. For an adorned atom p"{t), let magic{p" {t)) be its magic version 
defined as the atom magicjp'^{t'), where t' is obtained from t by eliminating 
all arguments corresponding to an / label in a, and where magic-p" is a new 
predicate symbol (for simplicity denoted by attaching the prefix ^''magicJ^ 
to the predicate symbol p"'). Intuitively, magic-p" {i')'d (t? a substitution) is 
inferred by the rules of the rewritten program whenever a top-down evaluation 
of the original program would process a subquery of the form p°'{t"), where t" 
is obtained from t by applying ■§ to all terms in t'. 

Thus, if qi'{si) is an adorned atom (i.e., /3j is not the empty string) in the body 
of an adorned rule r" having p°'{t) in head, a magic rule r* is generated such 
that (i) H{r*) = {magic{qf'{si))} and (ii) B{r*) is the union of {magic{p°'{t))} 
and the set of all the atoms qf{sj) G S+(r) such that qj{sj) qi{si). 

Example 2.5 In our running example, only one magic rule is generated, 

rl : magic_path^^(Z,Y) :- magic_path^''(X, Y), edge(X,Z). 

In fact, the adorned rule r" does not produce any magic rule, since there is no 
adorned predicate in B^{r1). □ 

(3) Modification. The adorned rules are subsequently modified by adding 
magic atoms to their bodies. These magic atoms limit the range of the head 
variables avoiding the inference of facts which cannot contribute to the deriva- 
tion of the query. In particular, each adorned rule r", whose head atom is p°'{i), 
is modified by adding the atom magic{p°'{t)) to its body. The resulting rules 
are called modified rules. 

Example 2.6 In our running example, the following modified rules are gen- 
erated: 
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path^''(X,Y) :- magic_path*'^(X, Y), edge(X,Y). 
path^''(X,Y) :- magic_path''^(X, Y), edge(X,Z), path'=^(Z, Y). 



□ 



(4) Processing the Query. Finally, given the adorned predicate g" obtained 
when adorning a query g{t), (1) a magic seed magic{g" it)) (a fact) and (2) 
a rule g{f) :— g°'{t) are produced. In our example, inagic_path''^(l, 5) and 
path(X,Y) :- path''^(X,Y) are generated. 

The complete rewritten program according to the Magic Set method consists 
of the magic, modified, and query rules (together with the original EDB). 
Given a Datalog program V, a query Q, and the rewritten program V', it is 
well-known that V and V' are equivalent with respect to Q, i.e., V=^qV' and 



Example 2.7 The complete rewriting of our running example is as follows: ^ 

magic_path''^(l,5). 

path(X,Y) :- path^''(X, Y). 
rl : magic_path*'^(Z,Y) :- magic_path^^(X, Y), edge(X, Z). 
r[ : path^^(X,Y) :- niagic_path^^(X, Y), edge(X,Y). 

: path^^(X,Y) :- niagic_path^^(X, Y), edge(X,Z), path^^(Z,Y). 

In this rewriting, magic_path*'^(X, Y) represents a potential sub-path of the 
paths from 1 to 5. Therefore, when answering the query, only these sub-paths 
will be actually considered in the bottom-up computation. One can check that 
this rewriting is in fact equivalent to the original program with respect to the 
query path(l, 5). □ 



3 Magic Set Method for Datalog^ '^ Programs 

In this section we present the Dynamic Magic Set algorithm (DMS) for the op- 
timization of disjunctive programs with stratified negation. Before discussing 
the details of the algorithm, we informally present the main ideas that have 
been exploited for enabling the Magic Set method to work on disjunctive pro- 
grams (without negation). 

3. 1 Overview of Binding Propagation in Datalog^ Programs 

As first observed in [33], while in non-disjunctive programs bindings are propa- 
gated only head-to-body, a Magic Set transformation for disjunctive programs 

^ The Magic Set rewriting of a program V affects only IDB{V), so we usually omit 
EDB{V) in examples. 



V='qP' hold [63]. 
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has to propagate bindings also hcad-to-head in order to preserve soundness. 
Roughly, suppose that a predicate p is relevant for the query, and a disjunctive 
rule r contains p{X) in the head. Then, besides propagating the binding from 
p{X) to the body of r (as in the non-disjunctive case), the binding must also 
be propagated from p{X) to the other head atoms of r. The reason is that 
any atom which is true in a stable model needs a supporting rule, which is a 
rule with a true body and in which the atom in question is the only true head 
atom. Therefore, r can yield support to the truth of p{X) only if all other 
head atoms are false, which is due to the implicit minimality criterion in the 
semantics. 

Consider, for instance, a Datalog"^ program V consisting of the rule p(X) V 
q(Y) :— a(X, Y), b(X), and the query p(l). Even though the query propagates 
the binding for the predicate p, in order to correctly answer the query we also 
need to evaluate the truth value of q(Y), which indirectly receives the binding 
through the body predicate a(X,Y). For instance, suppose that the program 
contains the facts a(l, 2) and b(l); then the atom q(2) is relevant for the query 
p(l) (i.e., it should belong to the Magic Set of the query), since the truth of 
q(2) would invalidate the derivation of p(l) from the above rule, due to the 
minimality of the semantics. It follows that, while propagating the binding, 
the head atoms of disjunctive rules must be all adorned as well. 

However, the adornment of the head of one disjunctive rule r may give rise 
to multiple rules, having different adornments for the head predicates. This 
process can be somehow seen as "splitting" r into multiple rules. While this 
is not a problem in the non- disjunctive case, the semantics of a disjunctive 
program may be affected. Consider, for instance, the program consisting of 
the rule p(X, Y) V q(Y, X) :— a(X, Y), in which p and q are mutually exclusive 
(due to minimality) since they do not appear in any other rule head. Assuming 
the adornments p^* and q^* to be propagated, we might obtain rules whose 
heads have the form p^*(X,Y) V q^^(Y, X) (derived while propagating p''*) 
and p^^(X, Y) V q^*(Y, X) (derived while propagating q^*). These rules could 
support two atoms p''^(m, n) and q^*(n, m), while in the original program p(m, n) 
and p(n, m) could not hold simultaneously (due to semantic minimality), thus 
changing the original semantics. 

The method proposed in [33] circumvents this problem by using some auxiliary 
predicates that collect all facts coming from the different adornments. For in- 
stance, in the above example, two rules of the form collect_p(X, Y) :— p*^(X, Y) 
and collect_p(X, Y) p''*(X, Y) are added for the predicate p. The main defi- 
ciency of this approach is that collecting predicates will store a sizable superset 
of all the atoms relevant to answer the given query. 

An important observation is that these collecting predicates are defined in a 
deterministic way. Since these predicates are used for restricting the compu- 
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Algorithm r>MS{Q,V) 

Input: A Datalog^'"'^ program V, and a query Q = g{t)'? 
Output: The rewritten program DMS(Q,'P); 

var: S, D: set of adorned predicates; modifiedRules Q^-p ,magicRules Q^-p: set of rules; 
begin 

1. 5 := 0; D := 0; modifiedRules Q^-p := $; magicRules Q^-p := {BuildQuerySeed{Q, S)}; 

2. while 5 7^ do 

3. p" := an element of S; remove p" from S; add p" to D; 
4- for each rule r S 7-" and for each atom p{t) € H{r) do 

5. r'':=Adom{r,p°'{t),S,D)-, 

6. rnagicRulesQ^p := magicRuleSQ^p U Generate{r,p"{t},r'^); 

7. modifiedRulesQ^-p := modifiedRulesQ^-p LI {Modify{r,r°')}; 

8. end for 

9. end while 

10. DnS{Q,'P):=magicRulesQ^-p U modifiedRules Q^-p U EDB{V}; 

11. return DMS(Q,P); 
end. 

Fig. 1. Dynamic Magic Set algorithm (DMS) for Datalog^'"'* programs 

tation in [33] , a consequence is that assumptions during the computation can- 
not be exploited for determining the relevant part of the program. In terms of 
bottom-up systems, this implies that the optimization affects only the ground- 
ing portion of the solver. Intuitively, it would be beneficial to also have a form 
of conditional relevance, exploiting also relevance for assumptions. In fact, in 
Section 5, we provide experimental evidence for this intuition. 

In the following, we propose a novel Magic Set method that guarantees query 
equivalence and also allows for the exploitation of conditional or dynamic 
relevance, overcoming a major drawback of SMS. 

3.2 DMS Algorithm 

Our proposal to enhance the Magic Set method for disjunctive Datalog pro- 
grams has two crucial features compared to the one of [33]: 

(1) First, the semantics of the program is preserved by stripping off the adorn- 
ments from non-magic predicates in modified rules, and not by introduc- 
ing collecting predicates that can introduce overhead in the grounding 
process, as discussed in Section 3.1. 

(2) Second, the proposed Magic Set technique is not just a way to cut irrele- 
vant rules from the ground program; in fact, it allows for dynamic deter- 
mination of relevance, thus optimizing also the nondeterministic compu- 
tation by disabling parts of the programs which are not relevant in any 
extension of the current computation state. 

The algorithm DMS implementing these strategies is reported in Figure 1 as 
pseudo-code. We assume that all variables are passed to functions by reference, 
in particular the variable S is modified inside BuildQuerySeed and Adorn. 
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Its input is a Datalog^'"^ program^ V and a query Q. The algorithm uses two 
sets, 5" and D. to store adorned predicates to be propagated and already 
processed, respectively. After all the adorned predicates have been processed, 
the method outputs a rewritten program DMS(Q,P) consisting of a set of 
modified and magic rules, stored by means of the sets modifiedRulesQ^-p and 
magicRulesQ^-p, respectively (together with the original EDB). The main steps 
of the algorithm are illustrated by means of the following running example. 

Example 3.1 (Strategic Companies [15]) Let C = {ci, . . . , Cm} be a col- 
lection of companies producing some goods in a set G, such that each company 
Q e C is controlled by a set of other companies Oi Q C. A subset of the com- 
panies C" C C is a strategic set if it is a minimal set of companies satisfying 
the following conditions: Companies in C produce all the goods in G; and 
Oi C C' implies q G C", for each i = 1, . . . , m. 

We assume that each product is produced by at most two companies and 
that each company is controlled by at most three companies. It is known 
that the problem retains its hardness (for the second level of the polynomial 
hierarchy; see [15]) under these restrictions. We assume that production of 
goods is represented by an EDB containing a fact produced_by(p, Ci, C2) for 
each product p produced by companies Ci and C2, and that the control is 
represented by facts controlled_by(c, Ci, C2, C3) for each company c controlled 
by companies Ci, C2, and C3. ^ This problem can be modeled via the following 
disjunctive program Vsc'- 

r3 : sc(Ci) V sc(C2) :- produced_by(P, Ci, C2). 

r4 : sc(C) :- controlled_by(C,Ci, 02,03), sc(Ci), sc(02), 30(03). 

Moreover, given a company c e C, we consider a query Qg^ = sc(c) asking 
whether c belongs to some strategic set of G. □ 

The computation starts in step 1 by initializing 5", D, and modifiedRules Q^-p 
to the empty set. Then, the function BuildQuerySeed{Q, S) is used for 
storing in magicRulesQ^-p the magic seed, and inserting in the set ^S" the adorned 
predicate of Q. Note that we do not generate any query rules because standard 
atoms in the transformed program will not contain adornments. Details of 
BuildQuerySeed{Q, S) are reported in Figure 2. 

Example 3.2 Given the query Qgc — sc(c) and the program Vgc: function 
BuildQuerySeed{Qsc, S) creates the fact magic_sc^(c) and inserts sc^ in S. 

□ 

^ Note that the algorithm can be used for non-disjunctive and/or positive programs 
as a special case. 

^ If a product is produced by only one company, C2 = ci , and similarly for companies 
controlled by fewer than three companies. 
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Function BuildQuerySeed(Q, S) 

Input: Q: query; S : set of adorned predicates; 

Output: The query seed (a magic atom); 

var: a: adornment string; 

begin 

1. Let p{i) be the atom in Q. 

2. a := e; 

3. for each argument t in f do 

4. if t is a constant then a := ab; else a := af; end if 

5. end for 

6. add to S; 

7. return magic(j)°'{i)); 
end. 



Fig. 2. BuildQuerySeed function 



Function Adorn(r, p"(t), 5, D) 

Input: r: rule; adorned atom; S, D : set of adorned predicates; 

Output: an adorned rule; 

var: r": adorned rule; af. adornment string; 
begin 



1. Let (^^ '•^J^ '•^) be the SIPS associated with r and p"(t). 

2. r°- := r; 

3. for each IDB atom pi(tj) in H{r) U B+{r) U B-(r-) do 

4. ai := e; 

5. for each argument f in t do 

6. if t is a constant then 

7. ai CKi^; 
S. else 

9. Argument t is a variable. Let X be this variable. 

10. if X e fr" '^^^ {p{t)) or there is q{s) in B+{r) such that 

11. g(s) -<r'*^ pi(fi) and X G (g(s)) then 
i^. a, := ajb; 

_Z5. else 

14. at := aif; 

15. end if 

16. end if 

17. end for 

_ZS. substitute Pi{ti) in r"^ with p"^ (U); 

19. if set D does not contain p"* then add p"' to S; end if 

20. end for 
return r°; 



end. 

Fig. 3. Adorn function 

The core of the algorithm (steps 3-8) is repeated until the set 5" is empty, i.e., 
until there is no further adorned predicate to be propagated. In particular, 
an adorned predicate is moved from S to D in. step 5, and its binding is 
propagated in each (disjunctive) rule r &V oi the form 

r : p{t) V pi{ti) V ••• V Pn{in) ■- qi{si), qj{sj), 

not qj+i{sj+i), not qm{sm)- 

(with n > 0) having an atom p{f) in the head (note that the rule r is processed 
a number of times that equals the number of head atoms with predicate p; 
steps 4~8)- 
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(1) Adornment. Step 5 in Figure 1 implements the adornment of the rule. 
Different from the case of non- disjunctive positive programs, the binding of 
the predicate needs to be also propagated to the atoms Pi(ti), . . . ,Pn(^n) 
in the head. Therefore, binding propagation has to be extended to the head 
atoms different from which are therefore adorned according to a SIPS 
specifically conceived for disjunctive programs. Notation gets slightly more 
involved here: Since in non-disjunctive rules there is a single head atom, it 
was sufficient to specify an order and a function for each of its adornments 
(omitting the head atom in the notation). With disjunctive rules, an order 
and a function need to be specified for each adorned head atom, so it is no 
longer sufficient to include only the adornment in the notation, but we rather 
include the full adorned atom. 

Definition 3.3 (SIPS for Datalog'^'^^ rules) A SIPS for a Datalog"^^^^ 
rule r with respect to a binding a for an atom p{t) e H{r) is a pair 
(-<f®,/f®), where: 

(1) -<^"® is a strict partial order over the atoms in H[r) U B^[r) U B^[r), 
such that: 

(a) p{t) ® q{s), for all atoms q{s) e H{r) U B+{r) U B-{r) different 

from pit); 

(h) for each pair of atoms q{s) G {H{r) \ {pit)}) U B~{r) and b{z) G 
H{r) U B+{r) U B-{r), q{s) ® b{z) does not hold; and, 

(2) /^"® is a function assigning to each atom q{s) G Hir) U B'^{r) U B'ir) 
a subset of the variables in s — intuitively, those made bound when pro- 
cessing q{s). 

As for Datalog rules, for each atom q{s) occurring in r, the strict partial order 
specifies those atoms that have to be processed before processing atom 
g(s), and an argument X of q{s) is bound to a constant if there exists an 
atom q'{s') occurring in r such that q'{s') -<^"® g(s) and X G f'f'^^^H^q'is')). 
The difference with respect to SIPSes for Datalog rules is precisely in the 
dependency fromp(t) in addition to a, and in condition (l.b) stating that head 
atoms different from p{t) and negative body literals cannot provide bindings 
to variables of other atoms. 

The underlying idea is that a rule which is used to "prove" the truth of an 
atom in a top-down method will be a rule which supports that atom. This 
implies that all other head atoms in that rule must be false and that the body 
must be true. Head atoms and atoms occurring in the negative body cannot 
"create" bindings (that is, restrict the values of variables), but these atoms 
are still relevant to the query, which leads to the restrictions in Definition 3.3. 

Note that this definition considers each rule in isolation and is therefore in- 
dependent of the inter-rule structure of a program. In particular, it is not 
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important for the SIPS definition whether a program is cychc or contains 
head cycles. 

In the following, we shall assume that each Datalog'^'"'^ program is provided 
in input together with some arbitrarily defined SIPS for Datalog^'"'^ rules 

(-<J!"® , /,^°®). In fact, armed with (-<5!"® , /^"®), the adornment can be car- 
ried out precisely as wc discussed for Datalog programs; in particular, we recall 
here that a variable X of an atom q{s) in r is bound if and only if either: 

(1) X G ff^^{q{s)) with q(s) = p{t}; or, 

(2) X e fP"^^{b{z)) for an atom b{z) e B+{r) such that b{z) ® q{s) 
holds. 

The function Adom{r,p'^{t), S, D) produces an adorned disjunctive rule r" 
from an adorned atom p"(r) and a suitable unadorned rule r (according to 
the bindings defined in the points (1) and (2) above), by inserting all newly 
adorned predicates in S. Hence, in step 5 the rule is of the form 

: V p^ih) V • • • V :- qf'{s^), q^\sj), 

not qj+iisj+i), not q^{sm). 



Details of Adorn{r,p°'{t), S, D) are reported in Figure 3. 

Example 3.4 Let us resume from Example 3.2. We are supposing that the 
adopted SIPS is passing the bindings via produced_by and controlled_by to 
the variables of sc atoms, in particular 
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controlled_by(C,Ci,C2,C3) ~<rf^^^ sc(C2) 
controlled_by(C,Ci,C2,C3) ^^f^^^ sc(C3) 

/;f (^^)(sc(CO) = {C,} 
/-''(^^)(produced_by(P,Ci,C2)) = {P,C2} 
/-''(^^)(sc(C2)) = 

/,f (^=)(SC(C2)) = {C2} 

/,f (^=)(produced_by(P,Ci,C2)) = {P,Ci} 
/-''(^=)(sc(CO) = 

(^)(sc(C)) = {C} 

(^)(controlled_by(C,Ci,C2,C3)) = {Ci, 02,03} 
/,f (^)(sc(CO) = /.f (<^)(sc(C2)) = /,f (^)(sc(C3)) = 

When sc^ is removed from the set S, we first select rule and the head 
predicate sc(Ci). Then the adorned version is 

1 : sc^(Ci) V sc^(C2) :- produced_by(P, Ci, C2). 

Next, ra is processed again, this time with head predicate sc(C2), producing 

2 : sc^(C2) V sc^(Ci) :- produced_by(P, Ci, C2). 

Finally, processing we obtain 

r2 : sc^{C) :- controlled_by(C, Ci, C2, C3), sc^(Ci), sc^(C2), sc^(C3). □ 

(2) Generation. The algorithm uses the adorned rule r" for generating 
and collecting the magic rules in step 6 (Figure 1). More specifically, Gen- 
erate{r,p"{i),r"') produces magic rules according to the following schema: 
if p'i'iii) is an adorned atom (i.e., is not the empty string) occurring 
in r" and different from p°'{t), a magic rule r* is generated such that (i) 
H{r*) = {magic{p°'' (ii))} and (ii) B{r*) is the union of {magic{p°'{t))} and 
the set of all the atoms <l^^{sj) G B^{r) such that qj(sj) Piiti)- Details of 
Generate{r,p°'{t),r"-) are reported in Figure 4. 

Example 3.5 Continuing with our running example, by invoking Gener- 
ate{r3, sc^{Ci),r^i), the following magic rule is produced: 

r^^i '■ magi c_s 0^(02) :— magic_sc''(Ci), produced_by(P, Ci, C2). 
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Function Generate(r, p°'{t), r") 

Input: r: rule; p"(t): adorned atom; r": adorned rule; 
Output: a set of magic rules; 
var: R: set of rules; r*: rule; 
begin 

1. Let (^r'*', be the SIPS associated with r and p"(t). 

2. i? ;= 0; 

3. for each atom p"' (fj) in Jf(r'') U B+ (r") U (r") different from p" (t) do 
^. if ai e then 

5. r* := magic{p"^ (ti)) :— magic{p" (t)); 

6. for each atom Pj(tj) in such that Pj(ij) -<r Pi{ti) do 

7. add atom Pj{tj) to B+{r*); 

8. end for 

9. R := RU{r*}; 

10. end if 

11. end for 

12. return R; 
end. 



Fig. 4. Generate function 

Similarly, by invoking Generate{r^, sc''(C2), 2), the following magic rule is 
produced: 

Tg 2 : inagic_sc^(Ci) :— magic_sc'^(C2), produced_by(P, Ci, C2). 

Finally, the following magic rules are produced by Generate{r4, sc^(C),r2): 

rl^^ : magic_sc^(Ci) :- magic_sc^(C), controlled_by(C, Ci, C2, C3). 
r42 : magic_sc^(C2) :— niagic_sc^(C), controlled_by(C, Ci, C2, C3). 
3 : magic_sc^(C3) :- magic_sc*'(C), controlled_by(C, Ci, C2, C3). □ 

(3) Modification. In step 7 the modified rules are generated and collected. 

The only difference with respect to the Datalog case is that the adornments are 
stripped off the original atoms. Specifically, given an adorned rule associated 
with a rule r, a modified rule r' is obtained from r by adding to its body an 
atom magic{p" (t)) for each atom p"{t) occurring in H{r"'). Hence, the function 
Modify {r., r"), reported in Figure 5, constructs a rule r' of the form 

r' : p{t) V Pi(^i) V •■• V Pn{tri) mcigic{p"{t)),rnMgic{pi^{ti)), . . . , 

magic{p'^''{tn)), qi{si), qj{sj),not qj+i{sj+i), . . . , not qm{sm)- 

Finally, after all the adorned predicates have been processed, the algorithm 
outputs the program DMS(Q, V). 

Example 3.6 In our running example, we derive the following set of modified 
rules: 
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Function Modify (r, 7°) 
Input: r: rule; r": adorned rule; 
Output: a modified rule; 
var: r': rule; 
begin 

1. r' := r; 

2. for each atom p"(t) in _f/(r") do 

3. add magic(p"(F)) to B+{r'); 
4- end for 

5. return r'; 
end. 

Fig. 5. Modify function 

r'^ i : sc(Ci) V sc(C2) :— magic_sc'^(Ci), iiiagic_sc^(C2), 

produced_by(P, Ci, C2). 

'■ sc(C2) V sc(Ci) :— magi c_s 0^(02), magic_sc^(Ci), 

produced_by(P, Ci, C2). 
r'^ : sc(C) :— magic_sc^(C), controlled_by(C, Ci, C2, C3), 
sc(Ci), sc(C2), sc(C3). 

Here, r'^^ (resp. r'^2j ^4) is derived by adding magic predicates and stripping 
off adornments for the rule r^i (resp. r^2^ ^4)- Tlius, tlie optimized program 
DMS( Qscj'^sc) comprises the above modified rules as well as the magic rules 
in Example 3.5, and the magic seed magic_sc'^(c) (together with the original 
EDB). □ 

Before estabhshing the correctness of the technique, we briefly present an 
example of the application of DMS on a program containing disjunction and 
stratified negation. 

Example 3.7 Let us consider a slight variant of the Strategic Companies 
problem described in Example 3.1 in which we have to determine whether a 
given company c does not belong to any strategic set. We can thus consider 
the query nsc(c) for the program Vnsc obtained by adding to Vsc the following 
rule: 

rnsc '■ nsc(C) :— company(C), not sc(C). 

where company is an EDB predicate. Company c does not belong to any 
strategic set if the query is cautiously false. 

In this case, processing the query produces the query seed magic_nsc^(c) (a 
fact) and the adorned predicate nsc^ (which is added to set S). After that, 
nsc'' is moved from S to D and rule Tnsc is considered. Assuming the following 
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SIP: 



nsc(C) ^r^y^ company(C) nsc(C) -(rltT^ ^^(^) 

/-:(^)(nsc(C)) = {C} /-:(^)(company(P)) = fZy\sc{C)) = 

by invoking Adorn{rnsc-, nsc*'(C), 5, D) we obtain the following adorned rule: 
r^g^ : nsc''(C) :- compaiiy(C), not sc^(C). 

The new adorned predicate sc'' is added to S. Then, Gener- 
aie(r„5c, nsc^(C), r^^J and Modify (rnscif^sc) produce the following magic 
and modified rules: 

r*5(, : magic_sc^(C) :— magicjisc''(C). 

r'^g^ : nsc(C) :— magic_iisc^(C), compaiiy(C), not sc(C). 

The algorithm then processes the adorned atom sc^. Hence, if the SIPS pre- 
sented in Example 3.4 is assumed, the rewritten program comprises the fol- 
lowing rules: r'^,^, r^_i, r^_2, r^, r^,^, rl^, rl^, r^i, r^,2 and r^^g. □ 

3.3 Query Equivalence Result 

We conclude the presentation of the DMS algorithm by formally proving its 
correctness. We would like to point out that all of these results hold for any 
kind of SIPS, as long as it conforms to Definition 3.3. Therefore, in the remain- 
der of this section, we assume that any program comes with some associated 
SIPS. In the proofs, we use the well established notion of unfounded set for 
disjunctive Datalog programs (possibly with negation) defined in [44]. Before 
introducing unfounded sets, however, we have to define partial interpretations, 
that is, interpretations for which some atoms may be undefined. 

Definition 3.8 (Peirtial Interpretation) Let V be a Datalog^'^ program. 
A partial interpretation for V is a pair (T, N) such that T C N C B-p. The 

atoms in T are interpreted as true, while the atoms in N are not false and 
those in N \ T are undefined. All other atoms are false. 

Note that total interpretations are a special case in which T — N. We can 
then formalize the notion of unfounded set. 

Definition 3.9 (Unfounded Sets) Let{T,N) be a partial interpretation for 
a Datalog^''' program V, and X C S-p be a set of atoms. Then, X is an 
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unfounded set for V with respect to {T, N) if and only if, for each ground rule 
rg G GroundiV) with X fl H{rg) ^ 0, at least one of the following conditions 
holds: (l.a) B+{rg) ^ N; (1.6) B-{rg) n T 7^ 0; (2) 5+(rJ n X 7^ 0; (3) 
H{rg) n (T \ X) ^ 0. 

Intuitively, conditions (l.a), (1.6) and (3) check if the rule is satisfied by (T, A^) 
regardless of the atoms in X, while condition (2) checks whether the rule can 
be satisfied by taking the atoms in X as false. 

Example 3.10 Consider again the program Vsc of Example 3.1 and assume 
EDBiVsc) = {produced_by(p, c, Ci)}. Then Ground{Vsc) consists of the rule 

Tsc '■ sc(c) V sc(ci) :— produced_by(p, c, Ci). 

(together with facts, and rules having some ground instance of EDB predi- 
cate not occurring in EDB{Vsc), omitted for simplicity). Consider now a par- 
tial interpretation (Mgc, B-p^J such that Mgc = {produced_by(p, c, Ci), sc(c)}. 
Thus, {sc(ci)} is an unfounded set for V with respect to (M^c B-p^J (r^c sat- 
isfies condition (3) of Definition 3.9), while {sc(c),sc(ci)} is not (r^c violates 
all conditions). □ 

The following is an adaptation of Theorem 4.6 in [44] to our notation. 

Theorem 3.11 ([44]) Let {T, N) be a partial interpretation for a Datalog"^'^ 
program V. Then, for any stable model M of V such that T C M 'O N, and 
for each unfounded set XofV with respect to {T,N), M n X = holds. 

Example 3.12 In Example 3.10, we have shown that {sc(ci)} is an un- 
founded set for V with respect to {Msc,Bp^J. Note that the total interpre- 
tation Msc is a stable model of Vsc, and that the unfounded set {sc(ci)} is 
disjoint from Mg^- n 

Equipped with these notions and Theorem 3.11, we now proceed to prove 
the correctness of the DMS strategy. In particular, we shall first show that the 
method is sound in that, for each stable model M of DKS{Q,V), there is a 
stable model M' of V such that M'\q — M\q (i.e., the two models coincide 
when restricted to the query). Then, we prove that the method is also complete, 
i.e., for each stable model M' of V, there is a stable model M of DMS(Q, 7^) 
such that M'\q^ M\q. 

In both parts of the proof, we shall exploit the following (syntactic) relation- 
ship between the original program and the transformed one. 

Lemma 3.13 Let V be a Datalog"^'"" program, Q a query, and let 
magic{p"{t)) be a ground atom^ in B^yi?,{q,v) ('the base of the transformed 

^ Note that in this way the lemma refers only to rules that contain a head atom 
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program). Then the ground rule 

Tg : p{t) V pi(ti) V • • • V pn{in) ■■- qi{si), . . . , qj{sj), 

not qj+i{sj+i), not QmiSm)- 

belongs to Ground{V) if and only if the ground rule 

r'g : p{t) V pi(ti) V • • • V Pn{tn) ■- magic{p°'{t)),magic{pi^{ii)), 

magic{p"''{tn)),qi{si), . . .,qj{sj),not qj+i{sj+i), not qm{sm)- 
belongs to Ground{DKS{Q,V)). 

Proof. {=>) Consider the following rule r E V such that = r?? for some 
substitution 

r : p{t') V p,{t[) V ■ ■ ■ V pnit'j :- qi{s[), qj{s'j), 

not gj+i(s^+i), . . . , not qm{s'm)- 

Since magic{p°' (t)) is a ground atom in -Bdms(q,p)j has been inserted in the 
set S at some point of the Magic Set transformation, and it has eventually 
been used to adorn and modify r, thereby producing the following rule r' e 
DMS(Q,7'): 

r' : p{t') V pi{t[) V • • • V pn{i'j :- magic{p'^{i')),magic{p'^'{i[)), 

magic{p^^(t'J),qi(s[), qj(s'j),not qj+i(s'j^i), not qm(s'^)- 

Clearly enough, the substitution 19 mapping r into can also be used to map 
r' into r'g, since the magic atoms added into the positive body of r' are defined 
over a subset of the variables occurring in head atoms. 

(<^) Let r' e T)VIS{Q,V) be a rule such that r'g — r'-d for some substitution 

r' : p{t') V pi(ti) V • • • V pn{t'ri) ■■- magic{p''{i')),magic{p'^'{i'i)), 

magic{p^^{t'J), gi(s'i), . . . , qj{s'j),not qj+i(s'j^i), not qm(s'^)- 

By the construction of DMS(Q, P), r' is a modified rule produced by adding 
some magic atom to the positive body of a rule r &V oi the form: 

r : p{t') V pi(ti) V • • • V pnit'j :- qi{s',), qj{s'j), 

not . . . , not qm{s'm)- 

for which a magic predicate has been generated during the transformation. 
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Thus, the substitution i? mapping r' to r'g can also be used to map r to r^, 
since r and r' have the same variables. □ 



3.3.1 Soundness of the Magic Set Method 

Let us now start with the first part of the proof, in particular, by stating 
some further definitions and notations. Given a model M' of DMS(Q,P), and 
a model N' C M' of Ground{ms{Q,V))^\ we next define the set of atoms 
which are relevant for Q but are false with respect to N'. 

Definition 3.14 (Killed Atoms) Given a model M' for DMS(Q,'P), and a 
model N' C M' of Ground{Dns{Q,V))^' , the set killed^'j,{N') of the killed 
atoms with respect to M' and N' is defined as: 

{k{t) ^ B-p \ N' I either k is an EDB predicate, or 

there is a binding a such that magic{k"{t)) e iV'}. 

Example 3.15 We consider the program DMS (Q^c^sc) pre- 
sented in Section 3.2 (we recall that Qsc — sc(c)), the EDB 
{produced_by(p, c, Ci)} introduced in Example 3.10, and a stable model 
^'sc ~ {produced_by(p, c, Ci), sc(c), magic_sc^(c), magic_sc^(ci)} for 
DMS(Q,c,^.c)- Thus, Ground{pm[Q,^,Vsc))^'^ consists of the following rules: 

magic_sc^(c). magic_sc^(ci) :— magic_sc^(c). 

sc(c) V sc(ci) :— magic_sc''(c), magic_sc''(ci), produced_by(p, c, Ci). 

Since M'^^ is also a model of the program above, we can compute 
killed^ll-p^^{M'g^ and check that sc(ci) belongs to it because of magic_sc'^(ci) 
in M'g^. Note that, by definition, also false ground instances of EDB pred- 
icates like produced_by(p, Ci, c) or controlled_by(c, Ci, Ci, Ci) belong to 
killedQll-p^^{M'g^). Moreover, note that no other atom belongs to this set. □ 

The intuition underlying the definition above is that killed atoms are either 
false ground instances of some EDB predicate, or false atoms which are rele- 
vant with respect to Q (for there exists an associated magic atom in the model 
N')] since N' is a model of Grotin(i(DMS(Q, V))^^ contained in M', we expect 
that these atoms are also false in any stable model for V containing M'\b-p 
(which, we recall here, is the model M' restricted on the atoms originally 
occurring in V). 

Example 3.16 Let us resume from Example 3.15. We have that M^^lp^^ — 
{produced_by(p, c, Ci), sc(c)}, which coincides with model Mgc of Exam- 
ple 3.10. Hence, we already know that {sc(ci)} is an unfounded set for Vsc 
with respect to {MgcB-p^J. Since each other atom k{t) in killedQ^^-p^^{MgJ 
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is such that k is an EDB predicate, we also have that killedQ^^-p^^(Mg^) is an 
unfounded set for Vsc with respect to {Mgc, Bp^J. Therefore, as a consequence 
of Theorem 3.11, each stable model M of Vsc such that Mg^ CMC B-p^^ (in 
this case only Mgc itself) is disjoint from kiUedQ^^^p^^{M'g^) . □ 

This intuition is formalized below. 

Proposition 3.17 Let M' be a model for mS{Q,r), and N' C M' be a model 
of Ground{DnS{Q,V))^' . Then, killed^[-p{N') is an unfounded set for V with 
respect to {M' \ B-p , B-p) . 

Proof. According to Definition 3.9 of unfounded sets (for V with respect to 
{M'Ibj,, Bp)), given any rule in GroundiV) of the form 

rg : k{t) V pi{ti) V • • • V p„(t„) :- gi(si), . . . , qj{sj), 

not gj+i(sj+i), . . . , not QmiSm)- 

we have to show that if k{t) e killedQ'p{N') Pi H{rg), then at least one of the 
following conditions holds: (l.a) B+{rg) ^ Bp- {l.h) B-{rg) n M'Ib^, ^ 0; (2) 
B+{rg) n kiUed^'p{N') ^ 0; (3) E{rg) n {M'\b^ \ kiUed^'p{N')) ^ 0. 

Note that the properties above refer to the original program V. However, our 
hypothesis is formulated over the transformed one DMS(Q, T') (for instance, 
we know that M' is a model of DMS(Q,'P)). The line of the proof is then to 
analyze DMS(Q, V) in the light of its syntactic relationships with V established 
via Lemma 3.13. In particular, recall first that, by Definition 3.14, there is a 
binding a such that magic{'k." (t j) G A^' (and, hence, magic{k°' (tj) is a ground 
atom in Sdms (Q,-p)). Thus, we can apply Lemma 3.13 and conclude the existence 
of a ground rule r'g e Ground{Dl'[S{Q,V)) such that: 

r'g : k{i) V pi{ii) V • • • V pn{tn) ■- magic{k"{i)),magic{p'^'{ii)), 
magic{p^"{t„)),qi{si), qj{sj),not qj+i{sj+i), not 

Since M' is a model of DMS(Q, 7^), the proof is just based on analyzing the 
following three scenarios that exhaustively cover all possibilities (concerning 

the fact that the rule r'g is satisfied by M'): 

(51) B~{r'g) n M' 7^ 0, i.e., the negative body of r'g is false with respect to 
M'; 

(52) B^{r'g) % M' , i.e., the positive body of r'g is false with respect to M'; 

(53) B-{r'g) n M' = 0, B+{r'g) C M', and H{r'g) n M' ^ (J), i.e., none of the 
previous cases holds, and hence the head of r'g is true with respect to M'. 
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In the remaining, we shall show that (SI) implies condition {l.b). (S2) implies 
condition (2), and (S3) implies either (2) or (3). In fact, note that condition 
(l.a) cannot hold. 

(51) Assume that B-{r'g) n M' ^ 0. Since ^-(rj = B-{r'g) and B-{rg) C 
Bp, from B-{r'g) n M' we immediately conclude B-{rg) n M'|b^ 0, 
i.e., {l.b) holds. 

(52) Assume that S+(r^) ^ M', and let r' G mS{Q,V) be a modified rule 
such that r'g = r'^d for some substitution {}: 

r' : k{t') V pi{t'^) V • • • V pn{i'^) :- ma(72c(A;«(F)), "^a^?«c(p?^ • • • > 
magic{p'^-{t'J), qi{s[), qj{s'j),not qj+i{s'j^^), not Qmis'J. 

We first claim that B'^(r'g)\B.p % N' must hold in this case. To prove the 
claim, observe that during the Generation step preceding the production 
of r', a magic rule r* such that H{r*) = {magic{pf\t^))} and B^{r*) C 
{magic{k°'{i')), qi{s'i), . . . ,qj{sj)} has been produced for each 1 < i < n 
(we recall that magic rules have empty negative bodies). Hence, since the 
variables of r* are a subset of the variables of r', by applying the substitution 
'd to r* we obtain a ground rule r*g such that H{r-*g) = {magic{pf' (ii))} and 
B+{rlg) C {magic{k^{i)), gi(si),'. . . , = {magicik"ii))}UB+{r'g)\B^. 

Thus, if B~^[r'g)\Bj, C N', from the above magic rules and since A*"' is a 
model containing magic{k'^{t)) by assumption, then we would conclude that 
B+{r'g) C N'. However, this is impossible, since A^' C M' and B+{r'g) % M' 
imply B+{r[D % N' . 

Now, B^(7'g)\Bj, 2 A^' implies the existence of an atom qi{si) G B~^{rg)\B.p 
such that qi{si) ^ N', that is, qi{si) G Bp\N'. In particular, we can assume 
w.l.o.g. that, for any q{s) G B~^{r'g)\Bj, with q{s') -<;^°'^*') qii^'j), it is the case 
that q{s) G N', where r is the rule in V from which the modified rule r' has 
been generated (just take a ^^"^*')-minimum element in B~^{r'g)\B.p \ N'). 
If qi is an EDB predicate, the atom gi(sj) belongs to killedQ'p{N') by the 
definition of killed atoms. Otherwise, qi is an IDB predicate. In this case, 
there is a magic rule r*, produced during the Generation step preceding 
the production of r', such that H{r*) — {magic{qi' {s'^))} and B{r*) ~ 
{magic{k-{i'))} U {q{s') G B+{r) \ q{s') (*') Thus, = r*i? 

belongs to Ground{'DyiS{Q,V)) . In particular, B~^{r*g) C A"' holds because 
magic{k°' it)) belongs to A^' and by the properties of qi{si). Therefore, since 
A^' is a model of Grow?7.(i(DMS(Q, P))^', magiclq^'' (si)) belongs to A^', from 
which qi{si) G killedQ'p{N') follows from the definition of killed atoms. 
Thus, independently of' the type (EDB, IDB) of qi, (2) holds. 

(53) Assume that B+(r'g) C M', B-{r'g)nM' = 0, and H{r'g)nM' ^ 0. First, 
observe that from B~{r'^ f] M' — $ we can conclude that there is a rule 
in Ground{DKS{Q,V))'^' obtained from r'g by removing its negative body 
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literals. Consider now the rules r,*g produced during the Generation step, 
for each 1 < i < ?7, (as in (S2)). We distinguish two cases. 

If {qi{si), . . . ,qj{sj)} C A^', since magic{k^{t)) e A^', we can conclude 
that B^{r*g) C A^', for each 1 < i < n. Moreover, since N' is a model of 
Ground{mS,{Q,V))^' , the latter implies that magic{pi' (U)) e N', for each 
l<i<n. Then B+{r'g) C A^' holds, and so H{r'g) f\N' (because N' is 
a mo del of Ground{m?,{Q,V))^'). We now observe that H{r'g) fl {M'\bj, \ 
kiUed^'-p{N')) ^ is equivalent to {H{r'g) n M'\b^) \ killed^'j,{N') ^ 0. 
Moreover, the latter is equivalent to {H {r g)r]M')\killed^'.p{N') ^ because 
Hir'g) contains only standard atoms and H{r'g) = H[rg). In addition, from 
A^' C M' we conclude H{rg) n A^' C H{rg) fl M', and by Definition 3.14, 
N'nkilled^'^{N') = holds. Hence, {H{rg)f\M')\killed^'-p{N') D H{rg)n 
N', which is not empty, and so condition (3) holds. 

Otherwise, {qi{si), . . . , qj{sj)} ^ A^'. Let i G {!,..., j} be such that 
qi{si) ^ N' and, for any q{s) G B~^{r'g)\B.p, q{s') -<^"^^^ Qi{s'i) implies 
q{s) e A^' (where r is the rule in V from which the modified rule r' has 
been generated). If gj is an EDB predicate, the atom qi{si) belongs to 
killedQ.p{N') by the definition of killed atoms. Otherwise, qi is an IDB 
predicate and there is a magic rule r*g G G'row?T,d(DMS(Q, "P)) having an 
atom magic{qi'{si)) in head, and such that B^{r*g) C A^'. Therefore, 

magic{qi\si)) belongs to A^', from which qi{si) G killedQ-p{N') follows from 
the definition of killed atoms. Thus, independently of the type (EDB, IDB) 
of qi, (2) holds. □ 

We can now complete the first part of the proof. 

Lemma 3.18 For each stable model M' ofTMS{Q,V), there is a stable model 
M ofV such that M D M'\bt,. 

Proof. Let M be a stable model of P U M'\b^-, the program obtained by 
adding to a fact for each atom in M'\b^. We shall show that M is in fact 
a stable model of V such that M 3 M'lsp. Of course, M is a model of V 
such that M 3 M'\bj,- So, the line of the proof is to show that if M is not 
stable, then it is possible to build a model A^' of Ground(YM'Si{Q,V))^^ such 
that A^' C M', thereby contradicting the minimality of M' over the models of 
Ground{T)m{Q,V))^' . 

Assume, for the sake of contradiction, that M is not stable and let A^ C M be a 
model of Ground{V)^ . Define A^' as the interpretation {Nr\M'\Bj,)U{M'\B'p). 
By construction, note that N' C M', since M' coincides with M'|b^U(M'\5p). 
In fact, in the case where A^' = M', we would have that N D M'\b-p, since 
(A^ n M'l^^) and {Ad' \ B-p) are disjoint. Hence, A^ would not only be a model 
for Ground{VY^ but also a model for Ground{V U M'\bj,)^ , while on the 
other hand N <Z M holds. However, this is impossible, since M is a stable 
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model of 7-" U Ad'\Bj,- So, A'"' C M' must hold. Hence, to complete the proof 
and get a contradiction, it remains to show that A^' is actually a model of 
Ground{m^{Q,V))^\ i.e., it satisfies all the rules in Ground{m^{Q,V))^' . 
To this end, we have to consider the following two kinds of rules: 

(1) Consider a ground magic rule r* E Ground(TMS{Q,V))^' such that 
B~^{r*) C A^', and let magic{p°'{t)) be the (only) atom in H{r*). Since 
A^' C M', B+{r*g) C A^' implies that S+(r*) C M'. In fact, since M' is a 
model of DMS(Q,P) and \H{r*g)\ = 1, magic{p''{t}) e M' must hold (we 
recall that B~{r*) = 0). Moreover, since B-p docs not contain any magic 
atom, magic{p°'{i)) is also contained in M' \ B-p. Thus, by the construction 
of A^', we can conclude that i/(r*) HN' 

(2) Consider a rule obtained by removing the negative literals from a ground 
modified rule r'g e Ground{DKS{Q,V)) where 

r'g : p{i) V pi(ti) V • • • V Pn(tn) :- magic{p"{t)),magic{pi'{tij), 
magic{p'^"{tn)),qi{si),...,qj{sj),not qj+i{sj+i), not 



and where B+{r'g) C N'. Observe that B-{r'g) n M' = holds by the 
definition of reduct. Moreover, let Vg be the rule of Ground{V) associated 
with r'g (according to Lemma 3.13): 

Tg : p{i) V pi{ti) V • • • V Pn{in) ■- gi(si), . . . , qj{sj), 

not ^j+i(sj+i), . . . , not qm{sm)- 

We have to show that H{r'g) ON' $. The proof is based on estabhshing 
the following properties on r'g and r^,: 

• M n killed^'.p{M') ^ (1) 

• {H{r'g) \M')r\M^0; (2) 

• B-{r'g)nM = 0; (3) 
. H{r'g) n M' = H{r'g) D M'\b^ = H{r'g) D M; (4) 

• H{rg) r\Nj^0. (5) 

In particular, we shall directly prove (1), and show the following im- 
phcations: (1)^(2)A(3), (2)^(4), and (3)^(5). Eventually, based on (4) 
and (5), the fact that H{r'g) r\ N' ^ ^ can be easily derived as fol- 
lows: Since H{rg) C S-p, by the definition of A^' we can conclude that 
H{rg)nN' = H{rg)n{NnM'\B^) = {H{rg)nN)n{H{rg)nM'\B^). More- 
over, because of (4) and the fact that H{rg) — H{r'g), H{rg) n A^' coincides 
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in turn with {H{rg) H N) n {H{rg) n M). Then, recall that N d M. Thus, 
H{rg) n N' = H{r,j) H N . which is not empty by (5). 

In order to complete the proof, we have to show that all the above equa- 
tions actually hold. 

Proof of (1). We recall that, by Proposition 3.17, wc already know that 
kiUedQ'-p{M') is an unfounded set for V with respect to (M'lsp, B-p). In fact, 
one may notice that killed q'^'p{M') is an unfounded set for Vl^ M'\bt, with 
respect to (M'|s^, Bp) too, since the rules added to V are facts correspond- 
ing to the atoms in M'\b^ and M'\Bj,nkilled^'^{M') = by Definition 3.14. 
Thus, since M D M'\b^ and M is a stable model of VUM'\bj,, wc can apply 
Theorem 3.11 in order to conclude that M fl killed^'p{M') = 0. 

Proof of (2). After (1), we can just show that H{r'g)\M' C killed^[j,{M') . 
In fact, since N' C M' , wc note that B+{r'g) C A^' implies fi+(rp'c M' . 
Thus, H{r'g) \ M' C killed^'^.p{M') follows by Definition 3.14 and the form 
of rule r'g. 

Proof of (3). After (1), wc can just show that B'ir'^) C kiUed^'^{M') . 
Actually, we show that the IDB atoms in B^{r'g) belong to killedQ.p{M') , 
as EDB atoms in B~{r'g) clearly belong to killedQ'-p{M') because B^{r'g) fl 
M' = by assumption. To this end, consider a modified rule r' e DMS(Q, V) 
such that r'g — r''& for some substitution 

r' : p{i') V V ••• V :- magic{p''{i')),magic{pT{i'^)). ■ ■ ■ , 

magic{p'^" (^n)) > ?i > ■ ■ ■ > ?j i^'j), not g^+i (s^+J , . . . , not 

During the Generation step preceding the production of r', a magic rule r* 
with H{r*) = {magic{qi' {s'^))} and where B^{r*) C B^{r') has been pro- 
duced for each j <i <m such that qi is an IDB predicate. Hence, since 
the variables of r* are a subset of the variables of r', the substitution 'd can be 
used to map r* to a ground rule r*^ = r^i} with H{r*g) = {magic{q^' (si))} 
and B^{r*g) C B^{r'g). Now, since B~^{r'g) G N' G M', we can conclude 
that B^{r*^g) is in turn contained in M' . Thus, the head of r*g must be true 
with respect to M' (we recall that magic rules have empty negative bodies) . 
That is, magic{qi''{si)) E M' holds, for each j + 1 < i < m such that qi is an 
IDB predicate. Moreover, fi"(r^) H M' = implies that (sj) E B-p\ M', 
as gf (si) e B-{r'g). Thus, by Definition 3.14, (s^) G killed^[j,{M') . 

Proof of (4). The property immediately follows from (2) and the fact that 
H{r'g) C Bp and M D M'\b^. 

Proof of (5). Note that B~{rg) = B^{r'g), and so (3) implies that there is 
a rule in Ground(V)^^ obtained from by removing the atoms in B~{rg). 
Note also that B+{rg) = B+{r'g) n B-p G N' n Bp (since B+{r'g) C A'). 
Thus, by the definition of A"', B~^{rg) C A^ (more specifically, B~^{rg) C 
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A^nM'Isp). Moreover, since A'" is a model of Ground{V)^ , the latter entails 
that H{rg) nN □ 



Theorem 3.19 Let Q be a query for a Datalog^'^" program V. Then, for 
each stable model M' ofTMS{Q,V), there is a stable model MofV such that 

M'\q^M\q. 

Proof. Because of Lemma 3.18, for each stable model M' of DMS(Q, V), there 
is a stable model M oiV such that M D M'\b-p. Thus, we trivially have that 
M\q D M'\q holds. We now show that the inclusion cannot be proper. 

In fact, by the definition of DHS{Q,V), the magic seed is associated to any 
ground instance of Q. Then B-pIq \ M' C killed'^[-p{M') by Definition 3.14 
(we recall that B'p\q denotes the ground instances of Q). By Proposition 3.17, 
killed^'.p{M') is an unfounded set for V with respect to {M'\b^, B-p). Hence, 
by Theorem 3.11, we have that M fl killed^'-p{M') = 0. It follows that M n 
{B-pIqXM') = 0. Thus, M\q\M'\q = 0, which combined with M\q D M'\q 
implies M\q = M'\q. □ 



3.3.2 Completeness of the Magic Set Method 

For the second part of the proof, we construct an interpretation for DMS(Q, V) 
based on one for V. 

Definition 3.20 (Magic VEiriant) Let I be an interpretation for V . We de- 
fine an interpretation variantQ.p{I) for DMS(Q, P), called the magic variant 
of I with respect to Q and V, as the limit of the following sequence: 

variant^Q-p{I) = EDB(V); and 
varianfQj,{I) — variantQ.p{I) U 

{p{t) e / I there is a binding a such that 

magic{p"{t)) e variant Q-p{I)} U 
{magic{p°'{t)) | 3 r* e Ground{mS{Q,V)) such that 

magic{p°'{t)) e H{r*g) and B~^{r*) C variantQ-p{I)} , Mi > 0. 

Example 3.21 Consider the program DKS{Qsc-,Vsc) presented in Sec- 
tion 3.2, the EDB {produced_by(p, c, Ci)} and the interpretation 
Ms(. = {produced_by(p, c, Ci), sc(c)}. We next compute the magic variant 
variant -p^^ lMsc) of Mgc with respect to Qsc and Vsc- We start the sequence 
with the original EDB: variantQ^^^-p^^{Msc) = {produced_by(p, c, Ci)}. 
For variantQ_^^p_^^{Msc), we add magic_sc''(c) (the query seed), while 
for variant'Q^^-p^^{Msc), we add sc(c) (because sc(c) e Msc and 
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magic_sc''(c) G variant^Q_^^-p^^{Msc)), and magic_sc^(ci) (because 
magic_sc''(ci) :— magi c_s 0*^(0). is a rule of Ground(l)yiS{Qsc,Vsc)) and 
magic_sc''(c) e variant^Q^^p^^(Msc))- Any other element of the sequence 
coincides with variant^Q^^ 'p^^{Msc), and so also variant Q^^ p^^{Msc) ■ n 

By definition, for a magic variant variant Q.p{I) of an interpretation / with 
respect to Q and V, variant q \ C / holds. More interestingly, the magic 
variant of a stable model for V is in turn a stable model for DMS(Q,P). 

Example 3.22 The magic variant of Mgc with respect to Qsc and Vsc (see Ex- 
ample 3.21) coincides with the interpretation M'^^ introduced in Example 3.15. 
From previous examples, we know that Mgc is a stable model of Vsc, and M^^ 
is a stable model of DMS(Q<jc, Vsc)- ^ 

The following two lemmas formalize the intuition above, with the latter being 
the counterpart of Lemma 3.18. 

Lemma 3.23 For each stable model M of V, the magic variant M' = 
varianf^^-pi^M) of M is a model of Ground{T)m{Q,V))^' with M D M'ls^. 

Proof. As M' is the magic variant of the stable model M, we trivially 
have that M D M'\bj, holds. We next show that M' is a model of 
Ground{mS{Q,V))^' . To this end, consider a rule in Ground{Dns{Q,V))^' 
having the body true, that is, a rule obtained by removing the negative body 
literals from a rule r'g e Ground{mS{Q,V)) such that B'{r'g) n M' = and 
B+{r'g) C M' hold. We have to show that H{r'g) n M' ^ 0- 

In the case where r'g is a magic rule, then B^{r'g) C M' implies that the (only) 
atom in Hir'g) belongs to M' (by Definition 3.20). The only remaining (slightly 
more involved) case to be analyzed is where r'g is a modified rule of the form 

r'g : p{t) V pi(ti) V • • • V pn{tn) ■■- magic{p''{t)),magic{p'^'{ti)), • • • , 
magic{p^''{tn)),qi{si), . . .,qj{sj),not 5^+1(5^+1), . . . , not q^is^). 

In this case, wc first apply as usual Lemma 3.13 in order to conclude the 
existence of a rule rg G GroundiV) of the form 

rg : p{l) V pi{ii) V • • • V pn{tn) :- gi(si), ■ ■ ■ , qjisj), 

not gj+i(sj+i), . . . , not qmism)- 

Then, we claim that the following two properties hold: 

• B-{rg)r\M^0; (6) 
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• B+{rg) C M. 



(7) 



These properties are in fact what we just need to estabhsh the result. Indeed, 
since M is a model of Ground{V)^ , (6) and (7) imply H{rg) n M 7^ 0. 
So, we can recall that H{rg) = H^r'g), and hence let Pi{ti) be an atom in 
H{rg) n M = H{r'g) fl M and magic{p^' {t,j)) be its corresponding magic atom 
in B^{r'g) (i G {e, 1, . . . , n}, where e is the empty string). Since B^{r'g) C M' 
(by hypothesis) and since Piiti) G M, we can then conclude that Piiti) is in 
M' as well by Definition 3.20. That is, H{r'g) DM' 

Let now finalize the proof, by showing that the above properties actually hold. 

Proof of (6). Consider a modified rule r' G DMS(Q, V) such that r'g — r'-d for a 
substitution 

r' : pit') V pi(t;) V • • • V pnit'n) ■- magtc{p''(t')),magic{p'^'{t[)), 

magic{p^"{i'J), qi{s[), qj{s'j), not qj+i{s'j_^_^), not qm{s'^)- 

and the rule r G P from which r' is produced (such that = ri?): 

r : p{t') V pi{t[) V • • • V pnit'n) ■■- qi{s[), • • • , qAs',), 

not qj+i{s'j^^), . . . , not qm{s'^)- 

During the Generation step preceding the production of r', a magic rule r* 
such that H{r*) = {magic{q^\s'j))} has been produced for each j + 1 < i < 
m such that is an IDB predicate. Hence, since the variables of r* are a 
subset of the variables of r', the substitution 'd can be used to map r* to a 
ground rule r* ^ = r*i) such that H{r*y) = {magic{q'^^ (ti))} and B^{r*g) C 
B^{r'y) (we recall that magic rules have empty negative body). Now, since 
-B^(r^) C M', we can conclude that B^{r*g) is in turn contained in M' . Thus, 
by the construction of M', the head of r*g must be true with respect to M', 
that is, magic{qf* (ti)) G M' holds for each j + 1 < i < m such that qi is 
an IDB predicate. So, if some (IDB) atom qi{si) G B~{rg) belongs to M, 
by Definition 3.20 we can conclude that qi{si) G M' , which contradicts the 
assumption that B~{r'g) n M' = (we recall that B~{rg) = B~{r'g)). This 
proves that IDB predicates in B~ (rg) do not occur in M. The same trivially 
holds for EDB predicates too, since B-{rg) n M' = B~{r'g) n M' = and 
M' D EDB{V) (by the definition of magic variant). 

Proof of (7). The equation straightforwardly follows from the fact that 
B+{rg) = B+{r'g)\Bj,, and since M D M'js^ and S+(r^) C M' hold by the con- 
struction of M' and by the initial hypothesis on the choice of r'g, respectively. 
□ 
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Lemma 3.24 For each stable model M of V, there is a stable model M' of 
DMS(Q, P) (which is the magic variant of M) such that M D M'lsp- 

Proof. After Lemma 3.23, we can show that M' — variantQ-p{M) is also 
minimal over all the models of Ground{ms{Q,V))^' . Let N' C M' be a 
minimal model of Ground{mS{Q,V))^' . We prove by induction on the def- 
inition of the magic variant that M' is in turn contained in N'. The base 
case (i.e., variant^Qj,{M) C A^') is clearly true, since variant^Q-p{M) con- 
tains only EDB facts. Suppose variantQ^-p{M) C N' in order to prove that 
varianfQp{M) C N' holds as well. 

While considering an atom in variarifQ ^lM) \variantQ .p{M) , we distinguish 
two cases: 

(a) For a magic atom magic{p°'{t)) in varianfQplM) \ variantQp{M), by 
Definition 3.20 there must be a rule r* G Ground{T)KS{Q,V)) having 
H[r*g) = {magic{p°' (t))} and B^{r*) C varianfQ-p{M) (we recall that 
magic rules have empty negative body and so r* G Ground,{D}'lS{Q,V))^' 
holds). We can then conclude that B^{r*) C N' holds by the induc- 
tion hypothesis and so magic{p°'{t)) G A^' (because A^' is a model of 
Ground{ms{Q,P))^'). 

(b) For a standard atom p(t) in varianfQ\,{M) \ varianf^Qp^M), by Defi- 
nition 3.20 there is a binding a such that magic{p'^{i)) G varianfQ-p{M) 
and the atom p{t) belongs to M. Assume for the sake of contradiction 
that p{i) ^ N'. Since M' is a model of ms{Q,V) and AT' is a model of 
Ground{DKS{Q,V))^' , we can compute the set killedQ'-p{N') as introduced 
in Section 3.3.1 and note, in particular, that p{t) G killed^'p{N') holds (by 
definition). Moreover, by Proposition 3.17, killed^'^{N') is an unfounded set 
for V with respect to {M'\bj,^ B-p). In addition, M 3 M'\bj, holds by Defini- 
tion 3.20. Thus, M is a stable model for V such that M ^ M'\b^, and we can 
hence apply Theorem 3.11 in order to conclude that M n killed q''p{N') — 0. 
The latter is in contradiction with p{t) G killedQ'-p{N') and p{t) G M. 
Hence, p(t} E N'. ' □ 

We can then prove the correspondence of stable models with respect to queries. 

Theorem 3.25 Let Q be a query for a Datalog^'^" program V . Then, for each 
stable model M of V , there is a stable model M' of ]MS){Q,V) (which is the 
magic variant of M) such that M'\q = M\q. 

Proof. Let M be a stable model of V and M' = variant Qp{M) its magic 
variant. Because of Lemma 3.24, M' is a stable model of DMS(Q, V) such that 
M 3 M'Ibj,- Thus, we trivially have that M\q D M'\q holds. We now show 
the reverse inclusion. 
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Since M' is a stable model of DMS(Q, P), we can determine the set 
killed^ ■p{M') as defined in Section 3.3.1. Hence, by Definition 3.14 we can 
conclude that (a) Bp\q\M' <Z killed^' j,{M') b ecause M' contains the magic 
seed by construction (we recall that B-p\q denotes the ground instances of Q). 
Moreover, since M is a stable model of V with M D M'js^ and killedQ -p{M') 
is an unfounded set for V with respect to {M'\bt,, B-p) by Proposition 3.17, 
we can conclude that (b) M n killed^'j,{Af ) = by Theorem 3.11. Thus, by 
combining (a) and (b) we obtain that {Bp\Q\M')nM = 0, which is equivalent 
to M\qC M'\q. □ 

Finally, we show the correctness of the Magic Set method with respect to 
query answering, that is, we prove that the original and rewritten programs 
provide the same answers for the input query on all possible EDBs. 

Theorem 3.26 Let V be a Datalog^'"" program, and let Q be a query. Then 
DMS(Q,P)=^P an(i DMS(Q,P)=^P hold. 

Proof. We want to show that, for any set of facts T defined over the EDB 
predicates of P (and DMS(Q,P)), AnSb{Q,DKS{Q,V) Li T) = Ansi,{Q,V U T) 
and AnSc{Q,mS{Q,V) U J") = AnSc{Q,V U T) hold. We first observe that 
the Magic Set rewriting does not depend on EDB facts; thus, DMS(Q,'P) U 
T — DHS{Q,V U J-') holds. Moreover, note that Datalog"^'"^ programs always 
have stable models. Therefore, as a direct consequence of Theorem 3.19 and 
Theorem 3.25, we can conclude AnSb{Q,ms{Q,V Li T)) = Ans^iQ.V \J T) 
and AnSc{Q,ms{Q,V U J^)) = AnSc{Q,V[JT). □ 

3.4 Magic Sets for Stratified Datalog Programs without Disjunction 

Stratified Datalog programs without disjunction have exactly one stable model 
[29]. However, the Magic Set transformation can introduce new dependencies 
between predicates, possibly resulting in unstratified programs (we refer to the 
analysis in [38]). Clearly, original and rewritten programs agree on the query, as 
proved in the previous section, but the question whether the rewritten program 
admits a unique stable model is also important. In fact, for programs having 
the unique stable model property, brave and cautious reasoning coincide and 
a solver can immediately answer the query after the first (and unique) stable 
model is found. The following theorem states that the rewritten program of a 
stratified program indeed has a unique stable model. 

Theorem 3.27 Let V be a disjunction-free Datalog program with stratified 
negation and Q a query. Then DMS(Q, 7^) has a unique stable model. 

Proof. Let M be the unique stable model of P, and M' — variant^ 'p{M) 
its magic variant as presented in Definition 3.20. By Lemma 3.24 we already 
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know that M' is a stable model of DMS(Q,7-'). We now show that any stable 
model A^' of DMS(Q,P) contains M' by induction on the structure of M' . 
The base case {variant^Qj,{Al) C A^') is clearly true, since variant^ j,{M) 
contains only EDB facts. Suppose variantQ^-p{M) C A^' in order to prove 
that varianfQ^{M) C N' holds as well. Thus, while considering an atom in 
variantQ^{M) \ variant Q -p{M) , two cases are possible: 

(1) For a magic atom magic{j>^{t)) in varianfQ^y^M) \ varianVQj,[M), by 
Definition 3.20 there must be a rule r* G Ground{]iV[S{Q^V)) having 
H{r*g) — {magic{p'^{t))} and B^{r*) C variant Q^-p{M) (we recall that 
magic rules have empty negative bodies and so r* e Ground{DKS{Q,V))'^' 
holds). We can then conclude that B~^{r*) C N' holds by the induc- 
tion hypothesis and so magic{p" {t}) G N' (because N' is a model of 
Ground(mS{Q,V))^'). 

(2) For a standard atom p(t) in varianfQ^p{M) \ variant Q^-p{M), by Defi- 
nition 3.20 there is a binding a such that magic{p°' it)) G variantQ^.p{M) 
and the atom p{t) belongs to M. Assume for the sake of contradiction that 
p{t) ^ A^'. Since A^' is a stable model of T)KS{Q,V), we can compute the set 
killedQ-p{N') as introduced in Section 3.3.1 and note, in particular, that 
p{t) G killed^.p{N') holds, by definition. Moreover, by Proposition 3.17, 
killedQj,{N') is an unfounded set for V with respect to (A^'lsp, B-p). In ad- 
dition, by Lemma 3.25 there is a stable model N oiV such that A^ D N'Ib-p, 
which would mean that p{t) ^ N holds. Hence, we can conclude that N and 
M are two different stable models of V, obtaining a contradiction, as V has 
a unique stable model. 

Since stable models are incomparable with respect to containment, M' C A^' 
imphes M' ^ N'. Hence, M' is the unique stable model of DMS(Q, V). □ 



4 Implementation 

The Dynamic Magic Set method (DMS) has been implemented and integrated 
into the core of the DLV [43] system. In this section, we shall first briefiy de- 
scribe the architecture of the system and its usage. We then briefiy present an 
optimization for eliminating redundant rules, which are sometimes introduced 
during the Magic Set rewriting. 
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4-1 System Architecture and Usage 



We have created a prototype system by implementing the Magic Set technique 
described in Section 3 inside DLV, as shown in the architecture reported 
in Figure 6. DLV supports both brave and cautious reasoning, and for a 
completely ground query it can be also used for computing all stable models 
in which the query is true. DLV performs brave reasoning if invoked with the 
command-line option -FB, while -FC indicates cautious reasoning. 

In our prototype, the DMS algorithm is applied automatically by default when 
the user invokes DLV with -FB or -FC together with a (partially) bound query. 
Magic Sets are not applied by default if the query does not contain any con- 
stant. The user can modify this default behavior by specifying the command- 
line options -ODMS (for applying Magic Sets) or -ODMS- (for disabling Magic 
Sets) . 

If a completely bound query is specified, DLV can print the magic variant of 
the stable model (not displaying magic predicates), which witnesses the truth 
(for brave reasoning) or the falsity (for cautious reasoning) of the query, by 
specifying the command-line option — print -model. 

Within DLV, DMS is applied immediately after parsing the program and the 
query by the Magic Set Rewriter module. The rewritten (and optimized as de- 
scribed in Section 4.2) program is then processed by the Intelligent Grounding 
module and the Model Generator module using the implementation of DLV. 
The only other modification is for the output and its filtering: For ground 
queries, the witnessing stable model is no longer printed by default, but only 
if — print -model is specified, in which case the magic predicates are omitted 
from the output. 

The SIPS schema ^ implemented in the prototype is as follows: For a rule r, 
head atom p(t) and binding a, -<^"® satisfies the conditions of Definition 3.3, 
in particular p(t) -<^°® q(s) holds for all q{s) ^ p(t) in r, and q{s) y^f^^ b{z) 
holds for all head or negative body atoms q{s) 7^ p{t) and any atom b{z) in r. 
Moreover, all the positive body hterals of r form a chain in This chain 

is constructed by iteratively inserting those atoms containing most bound 
arguments (considering a and also the partially formed chain and /^"®) into 
the chain. Among the atoms with most bindings an arbitrary processing order 
(usually the order appearing in the original rule body) is used. Furthermore, 
/^"^(^(s)) — X holds if and only if q(s) belongs to the positive body of r, 
has at least one bound argument and X occurs in s. 

^ Since technically a SIPS has a definition for every single rule, implementations 
use a schema for creating the SIPS for a given rule. 
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Fig. 6. Prototype system architecture 

This means that apart from the head atom via which the rule is adorned, only 
positive body atoms can yield variable bindings and only if at least one of 
their arguments is bound, but both atoms with EDB and IDB predicates can 
do so. Moreover, atoms with more bound arguments will be processed before 
those with fewer bound arguments. 

Note that in this work we did not study the impact of trying different SIPS 
schemas, as we wanted to focus on showing the impact that our technique can 
have, rather than fine-tuning its parameters. While we believe that the SIPS 
schema employed is well-motivated, there probably is quite a bit of room for 
improvement, which we leave for future work. 

An executable of the DLV system supporting the Magic Set optimization is 
available at http://www.dlvsystein.coin/magic/. 



4-2 Dealing with Redundant Rules 



Even though our rewriting algorithm keeps the amount of generated rules 
low, it might happen that some redundant rules are generated when adorning 
disjunctive rules, thereby somewhat deteriorating the optimization effort. For 
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instance, in Example 3.6 the first two modified rules arc scmantically equiva- 
lent, and this might happen even if the two head predicates differ. In general 
not only duplicated rules might be created, but also rules which are logically 
subsumed by other rules in the program. Let us first give the definition of 
subsumption for Datalog^'"'" rules. 

Definition 4.1 LetV be a Datalog^'^ program, and letr andr' he two rules of 
V . Then, r is subsumed by r' (denoted by r ^ r' ) if there exists a substitution 
'& for the variables of r' , such that H{r')'& C H{r) and B{r')'& C B{r). A rule 
r is redundant if there exists a rule r' such that r C.r'. 

Ideally, a Magic Set rewriting algorithm should be capable of identifying all the 
possible redundant rules and removing them from the output. Unfortunately, 
this approach is unlikely to be feasible in polynomial time, given that sub- 
sumption checking on first-order expressions is NP-complete (problem [L018] 
in [27]). 

Thus, in order to identify whether a rule r produced during the Magic Set 
transformation is redundant, we pragmatically apply a greedy subsumption 
algorithm in our implementation, for checking whether r □ r' holds for some 
rule r'. In particular, the employed heuristics aims at building the substitution 
d (as in Definition 4.1) by iteratively choosing an atom p{i) (which is not yet 
processed) from r' and by matching it (if possible) with some atom of r. 
The greedy approach prefers those atoms of r' with the maximum number of 
variables not yet matched. 

To turn on subsumption checking (applied once after the Magic Set rewriting), 
DLV has to be invoked with the command-line option -ODMS+. 



5 Experiments on Standcird Benchmcirks 

We performed several experiments for assessing the effectiveness of the pro- 
posed technique. In this section we present the results obtained on various 
standard benchmarks, most of which have been directly adopted from the lit- 
erature. Further experiments on an application scenario using real-world data 
will be discussed in detail in Section 6. We also refer to [45,54] that contain 
performance cvahiations involving DMS; in [45] DLV with DMS was tested on 
Semantic Web reasoning tasks and confronted with a heterogeneous set of sys- 
tems, in [54] the system KA0N2, which includes a version of DMS, is confronted 
against other ontology systems. In both publications the impact of magic sets 
is stated explicitly. 
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5.1 Compared Methods, Benchmark Problems and Data 

In order to evaluate the impact of the proposed method, we have compared 
DMS (using the SIPS defined outlined in Section 4) both with the traditional 
DLV evaluation without Magic Sets and with the SMS method proposed in 
[33]. Concerning SMS, we were not able to obtain an implementation, and have 
therefore performed the rewriting manually. As a consequence, the runtime 
measures obtained for SMS do not contain the time needed for rewriting, while 
it is included for DMS. 

For the comparison, we consider the following benchmark problems. The first 
three of them had been already used to assess SMS in [33] , to which we refer 
for details: 

• Simple Path: Given a directed graph G and two nodes a and b, does there 
exist a unique path connecting a to 6 in Gl The instances are encoded 
by facts edge(vi,V2) for each arc {vi,V2) in G, while the problem itself is 
encoded by the program^ 

sp(X,X) V not_sp(X,X) :- edge(X,Y). 
sp(X,Y) V not_sp(X,Y) :- sp(X,Z), edge(Z,Y). 
path(X,Y) :- sp(X,Y). 
path(X,Y) :- not_sp(X,Y). 

not_sp(X,Z) :- path(X,Yi), path(X,Y2), Yi <> Y2, 
edge(Yi,Z),edge(Y2,Z). 

with the query sp(a, b). The structure of the graph, which is the same as 
the one reported in [33], consists of a square matrix of nodes connected as 
shown in Figure 7, and the instances have been generated by varying of the 
number of nodes. 

• Related: Given a genealogy graph storing information about relationships 
(father/brother) among people and given two people pi and p2, is pi an 
ancestor of p2? The instances are encoded by facts related(pi, P2) when pi 
is known to be related to p2, that is, when pi is the father or a brother of 

^ The first rule of the program models that for each node X of G, a unique path 
connecting X with itself can either exist or not. 
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Fig. 7. Instances structure of Simple Path and Related (left) and of Conformant 
Plan Checking (right) 

P2- The problem can be encoded by the program 

father(X,Y) V brother(X,Y) :- related(X,Y). 

ancestor(X, Y) :- father(X,Y). 

ancestor(X, Y) :— father(X,Z), ancestor (Z, Y). 

and the query is ancestor(pi, P2). The structure of the "genealogy" graph 
is the same as the one presented in [33] and coincides with the one used 
for testing Simple Path. Also in this case, the instances are generated by 
varying the number of nodes (thus the number of persons in the genealogy) 
of the graph. 

• Strategic Companies: This is a slight variant of the problem domain used in 

the running example. The description here is of the problem as posed in the 
Third ASP Competition. We consider a collection C of companies, where 
each company produces some goods in a set G and each company q G C is 
controlled by a set of owner companies C C. A subset of the companies 
C" C C is a strategic set if it is a minimal set of companies producing all the 
goods in G, such that if Oi C G' for some i = 1, . . . ,m then Ci & C' must 
hold. As in the Second Answer Set Competition, ^ we assume that each 
product is produced by at most four companies, and that each company is 
controlled by at most four companies (the complexity of the problem under 
these restrictions is as hard as without them) . Given two distinct companies 
Ci,Cj e C, is there a strategic set of C which contains both q and Cj? The 
instances are encoded by facts produced_by(p, Ci, C2, C3, C4) when product 
p is produced by companies Ci, C2, C3, and C4; if p is produced by fewer than 
four companies (but at least one), then ci, 02,03,04 contains repetitions of 
companies. Moreover, facts controlled_by(c, ci, C2, C3, C4) represent that 
company c is controlled by companies ci, 02,03, and 04; again, if is con- 
trolled by fewer than four companies, then 01,02,03, 04 contains repetitions. 

^ http: //www. cs .kuleuven.be/~dtai/events/ASP-competition/index. shtml 



42 



The problem can be encoded by the program 

st(Ci) V st(C2) V st(C3) V st(C4) produced_by(P,Ci,C2,C3,C4). 
st(C) :- controlled_by(C, Ci, C2, C3, C4), st(Ci), st(C2), st(C3), st(C4). 

with the query st(ci), st(cj). While the language presented in the previous 
sections allowed only for one atom in a query for simplicity, the implemen- 
tation in DLV allows for a conjunction in a query; it is easy to sec that a 
conjunctive query can be emulated by a rule with the conjunction in the 
body and an atom with a new predicate in the head, which contains all body 
arguments, and finally replacing the query conjunction with this atom. In 
this case this would mean adding a rule q(ci, Cj) :— st(ci), st(cj) and re- 
placing the query by q(ci,Cj). For this benchmark we used the instances 
submitted for the Second Answer Set Competition. 
• Conformant Plan Checking: In addition, we have included a benchmark 
problem, which highlights the fact that our Magic Set technique can yield 
improvements not only for the grounding, but also for the model generation 
phase, as discussed in Section 7. This problem is inspired by a setting in 
planning, in particular testing whether a given plan is conformant with 
respect to a state transition diagram [30]. Such a diagram is essentially a 
directed graph formed of nodes representing states, and in which arcs are 
labeled by actions, meaning that executing the action in the source state will 
lead to the target state. In the considered setting nondeterminism is allowed, 
that is, executing an action in one state might lead nondeterministically to 
one of several successor states. A plan is a sequence of actions, and it is 
conformant with respect to a given initial state and a goal state if each 
possible execution of the action sequence leads to the goal state. 

In our benchmark, we assume that the action selection process has al- 
ready been done, thus having reduced the state transition diagram to those 
transitions that actually occur when executing the given plan. Furthermore 
we assume that there are exactly two possible non-goal successor states for 
any given state. This can also be viewed as whether all outgoing paths of a 
node in a directed graph reach a particular confluence node. We encoded in- 
stances by facts ptrans(so, si, S2) meaning that one of states Si and S2 will 
be reached in the plan execution starting from sq. The problem is encoded 
using 

trans(X,Y) V trans(X, Z) :- ptrans(X, Y, Z). 

reach(X,Y) :- trans(X,Y). 

reach(X,Y) :- reach(X,Z), trans(Z,Y). 

and the query reach(0, l), where is the initial state and 1 the goal state. If 
the query is cautiously true, the plan is conformant. The transition graphs 
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Fig. 8. Simple Path: Average execution time 

in our experiments have the shape of a binary tree rooted in state 0, and 
from each leaf there is an arc to state 1, as depicted in Figure 7. 

In addition, we have performed further experiments on an appUcation scenario 
modeled from real- world data for answering user queries in a data integration 
setting. These latter experiments will be discussed in more detail in Section 6. 

5.2 Results and Discussion 

The experiments have been performed on a 3GHz Intel® Xeon® proces- 
sor system with 4GB RAM under the Debian 4.0 operating system with a 
GNU/Linux 2.6.23 kernel. The DLV prototype used has been compiled using 
GCC 4.3.3. For each instance, we have allowed a maximum running time of 
600 seconds (10 minutes) and a maximum memory usage of 3GB. 

On all considered problems, DMS outperformed SMS, even if SMS does not in- 
clude the rewriting time, as discussed in Section 5.1. Let us analyze the results 
for each problem in more detail. 

The results for Simple Path are reported in Figure 8. DLV without Magic 
Sets solves only the smallest instances, with a very steep increase in execution 
time. SMS does better than DLV, but scales much worse than DMS. The dif- 
ference between SMS and DMS is mostly due to the grounding of the additional 
predicates that SMS introduces. 

Figure 9 reports the results for Related. Gompared to Simple Path, DLV with- 
out Magic Sets exhibits an even steeper increase in runtime, while in contrast 
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both SMS and DMS scale better than on Simple Path. Comparing SMS and DMS, 
we note that DMS appears to have an exponential speedup over SMS. In this 
case, the computational gain of DMS over SMS is due to the dynamic optimiza- 
tion of the model search phase resulting from our Magic Sets definition. This 
aspect is better highlighted by the Conformant Plan Checking benchmark, 
and will be discussed later in this section. 

For Strategic Companies, we report the results in Figure 10 as a bar diagram, 
because the instances do not have a uniform structure. The instances are, 
however, ordered by size. Also here, DLV without Magic Sets is clearly the 
least efficient of the tested systems, resolving only the smallest two instances 
in the allotted time (600 seconds). Concerning the other systems, SMS and DMS 
essentially show equal performance. In fact, the situation here is quite different 
to Simple Path and Related, because grounding the program produced by the 
Magic Set rewriting takes only a negligible amount of time for SMS and DMS. 
For this benchmark the important feature is reducing the ground program to 
the part which is relevant for the query, and we could verify that the ground 
programs produced by SMS and DMS are precisely the same. 

Finally, the results for Conformant Plan Checking are shown in Figure 11. 
While DLV shows a similar behavior as for Simple Path and Related, here 
also SMS does not scale well at all, and in fact DMS appears to have an expo- 
nential speedup over SMS. There is a precise reason for this: While the Magic 
Set rewriting of SMS always creates a deterministic program defining the magic 
predicates, this is not true for DMS. As a consequence, all magic predicates are 
completely evaluated during the grounding phase of DLV for SMS, while for 
DMS this is not the case. At the first glance, this may seem fike a disadvan- 
tage of DMS, as one might believe that the ground program becomes larger. 
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However, it is actually a big advantage of DMS, because it offers a more precise 
identification of the relevant part of the program. Roughly speaking, what- 
ever SMS identifies as relevant for the query will also be identified as relevant 
in DMS, but DMS can also include nondeterministic relevance information, which 
SMS cannot. This means that in DMS Magic Sets can be exploited also during 
the nondeterministic search phase of DLV, dynamically disabling parts of the 
ground program. In particular, after having made some choices, parts of the 
program may no longer be relevant to the query, but only because of these 
choices, and the magic atoms present in the ground program can render these 
parts satisfied, which means that they will no longer be considered in this 
part of the search. SMS cannot induce any behavior like this and its effect is 
limited to the grounding phase of DLV, which can make a huge difference, as 
evidenced by Conformant Plan Checking. 



5.3 Experimenting DMS with other Disjunctive Datalog Systems 



In order to assess the effectiveness of DMS on other systems than DLV, we 
tested the grounder Gringo [28] with the following solvers: ClaspD [21], Cmod- 
els [46], GnTl and GnT2 [37]. ClaspD is based on advanced Boolean constraint 
solving techniques, featuring backjumping and conflict-driven learning. Cmod- 
els is based on the definition of program completion and loop formula for 
disjunctive programs [40,47], and uses a SAT solver for generating candidate 
solutions and testing them. GnTl is based on Smodels [61], a system handling 
Datalog programs with unstratified negation (normal programs): A disjunc- 
tive program is translated into a normal program, the stable models of which 
are computed by Smodels and represent stable model candidates of the orig- 
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Fig. 11. Conformant Plan Checking: Average execution time 

inal program. Each of these candidates is then checked to be a stable model 
of the original program by invoking Smodels on a second normal program. 
GnT2 is a variant of GnTl in which the number of candidates produced by 
the first normal program is reduced by means of additional rules that discard 
unsupported models, i.e., models containing some atom a for which there is 
no rule r such that B{r) is true and a is the only true atom in H{r). 

All of the benchmarks presented in the previous section were tested on these 
systems. Since DMS is not implemented in these systems, rewritten programs 
were produced by DLV during the preparation of the experiment. We recall 
that DMS does not depend on EDB relations and point out that DLV computes 
rewritten programs for the considered encodings in 1-2 hundredths of a second. 
The results of our experiment are reported in Figures 12-16. In general, we 
tried use a consistent scales in the graphs in order to ease comparability. 
However, for some graphs we chose a different scale in order to keep them 
readable for the main purpose (comparing performances with and without 
DMS), and we mention this explicitly in the accompanying text. 

Concerning Simple Path, the advantages of DMS over the unoptimized encoding 
are evident on all tested systems. In fact, as shown in Figure 12, without DMS 
all tested systems did not answered in the allotted time (600 seconds) on 
instances with more than 400 nodes (900 for Cmodels). On the other hand, all 
of the instances considered in the benchmark (up to 40 thousands of nodes) 
were solved by all tested solvers with the DMS encoding. We also observe that 
with DMS the tested systems are faster than DLV in this benchmark, which is 
a clear indication of the optimization potential that can be provided to these 
systems by our Magic Set technique. 
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For Related we obtained a similar result, reported in Figure 13 (we used a 
different scale for the y-axis for Cmodels for readability). Without DMS only 
the smallest instances were solved in the allotted time (up to 2025 nodes 

for ClaspD and Cmodels, up to 625 nodes for GnTl and CnT2). With DMS, 
instead, all tested systems solved the biggest instances of the benchmark (up 
to 10 thousands of nodes). In particular, with DMS Cmodels is as performant 
as DLV in this benchmark. 



The effectiveness of DMS is also evident in the Strategic Companies benchmark 
(Figures 14-15). In fact, we observed sensible performance gains of all systems 
on all tested instances. GnTl, which is already faster than the other tested 
systems in this benchmark, draws particular advantage from DMS, solving all 
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instances in few seconds. We give another evidence of the optimization po- 
tential provided by DMS to these systems by comparing the number of solved 
instances: Of a total of 60 tests, we counted 37 timeouts on the unoptimized 
encoding (10 on ClaspD, 14 on Cmodels, 3 on GnTl and 10 on GnT2), while 
just one on the encoding obtained by applying DMS. We point out that the 
timeout on the rewritten program was obtained by the Cmodels system, which 
alone collected 14 timeouts on the unoptimized encoding and is thus the least 
performant on this benchmark. 



Finally, consider the results for Conformant Plan Checking reported in Fig- 
ure 16 (we used a different scale on the y-axis for ClaspD for readability; 
note also that ClaspD and GnT2 only solved the smallest instances of this 
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benchmark, and we thus used a different scale for their x-axes). The perfor- 
mance of ClaspD is poor in this benchmark, nonetheless we observed a slight 
improvement in execution time if DMS is applied on the encoding reported 
in Section 5.1. Cmodels performs better than ClaspD in this case and the 
optimization potential of DMS emerges with an exponential improvement in 
performance. A similar result was observed for GnTl, while GnT2 on this 
benchmark is the only outlier of the experiment: Its performance deteriorates 
if the original program is processed by DMS. However, in this benchmark GnT2 
performs worse that GnTl also with the original encoding. In fact, while GnTl 
solved the biggest instance (more than 65 thousands of states) in 209.74 sec- 
onds (12.28 seconds with the DMS encoding), the execution of GnT2 did not 
terminate in the allotted time (600 seconds) on instances containing more than 
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Fig. 16. Conformant Plan Checking: Average execution time on other systems 

10 thousands of states. We finally note that with DMS GnTl and Cmodels are 
faster than DLV in this benchmark. In fact, for the biggest instance in the 
benchmark, GnTl and Cmodels required 12.28 and 19.13 seconds, respectively, 
while DLV terminated in 279.41 seconds. The significant performance gain of 
GnTl and Cmodels due to DMS is a further confirmation of the potential of 
our optimization technique. 



6 Application to Data Integration 



In this section we give a brief account of a case study that evidences the impact 
of the Magic Set method when used on programs that realize data integration 
systems. We first give an overview of data integration systems, show how they 
can be implemented using Datalog^'"', and finally assess the impact of Magic 
Sets on a data integration system involving real-world data. 



6.1 Data Integration Systems in a Nutshell 



The main goal of data integration systems is to offer transparent access to 
heterogeneous sources by providing users with a global schema, which users 
can query without having to know from what sources the data come from. 
In fact, it is the task of the data integration system to identify and access 
the data sources which are relevant for finding the answer to a query over 
the global schema, followed by a combination of the data thus obtained. The 
data integration system uses a set of mapping assertions, which specify the 
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relationship between the data sources and the global schema. Following [41], 
we formalize a data integration system X as a triple {Q, S, A4), where: 

(1) Q is the global (relational) schema, that is, a pair (^,S), where is a 
finite set of relation symbols, each with an associated positive arity, and 
E is a finite set of integrity constraints (ICs) expressed on the symbols 
in \E'. ICs are first-order assertions that are intended to be satisfied by 
database instances. 

(2) S is the source schema, constituted by the schemas of the various sources 
that are part of the data integration system. We assume that 5 is a re- 
lational schema of the form S = (^',0), which means that there are no 
integrity constraints on the sources. This assumption implies that data 
stored at the sources are locally consistent; this is a common assump- 
tion in data integration, because sources are in general external to the 
integration system, which is not in charge of analyzing or restoring their 
consistency. 

(3) Ai is the mapping which establishes the relationship between Q and S. 
In our framework, the mapping follows the GAV approach, that is, each 
global relation is associated with a view — a Datalog^'"'" query over the 
sources. 

The main semantic issue in data integration systems is that, since integrated 
sources are originally autonomous, their data, transformed via the mapping 
assertions, may not satisfy the constraints of the global schema. An approach 
to remedy to this problem that has lately received a lot of interest in the 
literature (see, e.g., [3,11,12,14,16-19,25,26]) is based on the notion of repair 
for an inconsistent database as introduced in [4] . Roughly speaking, a repair of 
a database is a new database that satisfies the constraints in the schema, and 
minimally differs from the original one. Since an inconsistent database might 
possess multiple repairs, the standard approach in answering user queries is 
to return those answers that are true in every possible repair. These are called 
consistent answers in the literature. 

6.2 Consistent Query Answering via Datalog^'"" Queries 

There is an intuitive relation between consistent answers to queries over data 
integration systems and queries over Datalog^'^^programs: Indeed, if one could 
find a translation from data sources, mapping, and the query to a Datalog^'"'" 
program, which possesses a stable model for each possible repair, and a query 
over it, the consistent answers within the data integration system will corre- 
spond to cautious consequences of the obtained Datalog^'"'" setting. 

In fact, various authors [5,7,14,16,17,31] considered the idea of encoding the 
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constraints of the global schema G into various kinds of logic programs, such 
that the stable models of this program yield the repairs of the database re- 
trieved from the sources. Some of these approaches use logic programs with un- 
stratified negation, [16], whereas disjunctive Datalog programs together with 
unstratified negation have been considered in [13,51]. 

It has already been realized earlier that Magic Sets are a crucial optimization 
technique in this context, and indeed the availability of the transformational 
approach using stable logic programming as its core language was a main mo- 
tivation for the research presented in this article, since in this way a Magic 
Set method for stable logic programs immediately yields an optimization tech- 
nique for data integration systems. Indeed, the benefits of Magic Sets in the 
context of optimizing logic programs with unstratified negation (but without 
disjunction) have been discussed in [24]. The Magic Set technique defined in 
[24] is quite different from the one defined in this article, as it does not consider 
disjunctive rules, and works only for programs, which are consistent, that is, 
have at least one stable model. In [51] our preliminary work reported in [20], 
which eventually led to the present article, has been expanded in an ad-hoc 
way to particular kinds of Datalog programs with disjunction and unstrati- 
fied negation. It is ad-hoc in the sense that it is tailored to programs which 
are created by the transformation described in [51]. The experimental results 
reported in [51] show huge computational advantages when using Magic Sets. 

We now report an alternative transformation which produces Datalog^'"'" pro- 
grams (therefore different to [51], there are no unstratified occurrences of nega- 
tion). This rewriting has been devised and used within the INFOMIX system 
on data integration [42]. 

Let I = {G, iS, M) be a data integration system where G = S), and let V 
be a database for G, which is represented as a set of facts over the relational 
predicates in G- We assume that constraints over the global schema are key 
and exclusion dependencies. In particular, we recall that a set of attributes x 
is a key for the relation r if: 

(r(x, y) A r{x, z)) ^ y — V{r(x, y),r{x, z)} C V 

and that an exclusion dependency holds between a set of attributes x of a 
relation r and a set of attributes ty of a relation s if 

(r(x, y) A s{iv, z)) -f y i- z, V{r(x, y),s{w, z)} C V 

Then, the disjunctive rewriting of a query q with respect to X is the Datalog^'"'" 
program n(X) = ^kd U ^ed U 11^^ U ^coii where: 

• For each relation r in ^ and for each key defined over its set of attributes 
^1 ^KD contains the rules: 
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rout{x,y) V rout{x,z) :- rv{x,y) , rv{x, z),Yi Zi. 

rout{x,y) V rout{x,z) :- rv{x,y) , rv{x,z),Ymy^Zm. 

where y = Fi, . . . , F^, and z = Zi, . . . , Z„,,. 

• For each exclusion dependency between a set of attributes x of a relation r 
and a set of attributes id oi a relation s, Ued contains the following rule: 

rout{x,y) V Sout{w,z) :- rv{x,y) , sv{w,z), Xi = Wi, Xm = Wr 

where x — Xi, . . . , X^, and w — Wi, . . . , Wm- In the implementation the 
following equivalent rule is used: 

rout{x,y) V Sout{x,z) :- rv{x,y), sv{x,z). 

• For each relation r in Q, Ucou contains the rule: 

r{iv) :- rviw) , not rout{w). 

• For each Datalog rule r in such that: 

k{t) :- gi(si),...,g^(s„). 

where A; is a relation in Q and Qi (for 1 < i < m) is a relation in Hm 
contains the rule: 

It can be shown that for each user query Q (over Q) and for each source 
database J-" (over S), consistent query answers to Q precisely coincide with 
the set AnSc{Q, n(I) U J-"). Actually, within the INFOMIX project also inclu- 
sion dependencies have been considered according to the rewriting discussed 
in [16], whose details we omit for clarity. Since the rewriting for inclusion de- 
pendencies also modifies queries, in the INFOMIX project queries have been 
limited to conjunctive queries. It is however important to notice that the pro- 
gram n(X) contains only stratified negation and is therefore a Datalog"^'"'" 
program, making the Magic Set method defined in this article applicable. 

6.3 Experimental Results 

The effectiveness of the Magic Set method in this crucial application context 
has then been assessed via a number of experiments carried out on the demon- 
stration scenario of the INFOMIX project, which refers to the information sys- 
tem of the University "La Sapienza" in Rome. The global schema consists of 14 
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Fig. 17. Average execution time of query evaluation in the INFOMIX Demo Scenario 

global relations with 29 constraints, while the data sources include 29 relations 
of 3 legacy databases and 12 wrappers generating relational data from web 
pages. This amounts to more than 24MB of data regarding students, profes- 
sors and exams in several faculties of the university. For a detailed description 
of the INFOMIX project see https://www.mat.unical.it/infomix/. 



On this schema, we have tested five typical queries with different character- 
istics, which model different use cases. For the sake of completeness, the full 
encodings of the tested queries are reported in the Appendix. In particular, we 
measured the average execution time of DLV computing AnSc{Q,Il{I) U J-') 
and >ln5c(Q, DMS(Q, n(X)) U J^) on datasets of increasing size. The experi- 
ments were performed by running the INFOMIX prototype system on a 3GHz 
Intel® Xeon® processor system with 4GB RAM under the Dcbian 4.0 operat- 
ing system with a GNU/Linux 2.6.23 kernel. The DLV prototype used as the 
computational core of the INFOMIX system had been compiled using GCC 
4.3.3. For each instance, we allowed a maximum running time of 10 minutes 
and a maximum memory usage of 3GB. 

The results, reported in Figure 17, confirm that on these typical queries the 
performance is considerably improved by Magic Sets. On Queries 1 to 4 in 
Figure 17 the response time scales much better with Magic Sets than without. 
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appearing essentially linear on the tested instance sizes, while without Magic 
Sets the behavior has a decidedly non-linear appearance. We also observe that 
there is basically no improvement on Query 5. We have analyzed this query 
and for this use case all data seems to be relevant to the query, which means 
that Magic Sets cannot have any positive effect. It is however important to 
observe that the Magic Set rewriting does not incur any significant overhead. 



7 Related Work 

In this section we first discuss the main body of work which is related to 
DMS, the technique developed in this paper for query answering optimization. 
In particular, we discuss Magic Set techniques for Datalog languages. The 
discussion is structured in paragraphs grouping techniques which cover the 
same language. After that, we discuss some apphcations for which DMS have 
already been exploited. All these applications refer to the preliminary work 
published in [20]. 

Magic Sets for Datalog. In order to optimize query evaluation in bottom- 
up systems, like deductive database systems, several works have proposed 
the simulation of top-down strategies by means of suitable transformations 
introducing new predicates and rewriting clauses. Among them. Magic Sets 
for Datalog queries are one of the best known logical optimization techniques 
for database systems. The method, first developed in [6], has been analyzed 
and refined by many authors; see, for instance, [9,55,62,63]. These works form 
the foundations of DMS. 

Magic Sets for Datalog"'" . Many authors have addressed the issue of ex- 
tending the Magic Set technique in order to deal with Datalog queries in- 
volving stratified negation. The main problem related to the extension of the 
technique to Datalog"^ programs is how to assign a semantics to the rewritten 
programs. Indeed, while Datalog"" programs have a natural and accepted se- 
mantics, namely the perfect model semantics [2,64], the application of Magic 
Sets can introduce unstratified negation in the rewritten programs. A solution 
has been presented in [10,38,39,59]. In particular, in [38,59] rewritten programs 
have been evaluated according to the well-founded semantics, a three-valued 
semantics for Datalog" programs which is two-valued for stratified programs, 
while in [10,39] ad-hoc semantics have been defined. All of these methods 
exploit a property of Datalog"" which is not present in disjunctive Datalog, 
uniqueness of the intended model. This property in turn implies that query 
answering just consists in establishing the truth value of some atoms in one 
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intended model. Using our terminology, brave and cautious reasoning coincide 
for these programs. Therefore, all these methods are quite different from DMS, 
the technique developed in this paper. 

Magic Sets for Datalog"'. Extending the Magic Set technique to Datalog" 
programs must face two major difficulties. First, for a Datalog" program 
uniqueness of the intended model is no more guaranteed, thus query answering 
in this setting involves a set of stable models in general. The second difficulty 
is that parts of a Datalog"' program may act as constraints, thus impeding a 
relevant interpretation to be a stable model. In [24] a Magic Set method for 
Datalog" programs has been defined and proved to be correct for coherent pro- 
grams, i.e., programs admitting at least one stable model. This method takes 
special precautions for relevant parts of the program that act as constraints, 
called dangerous rules in [24]. We observe that dangerous rules cannot occur 
in Datalog^'"" programs, which allows for the simpler DMS algorithm to work 
correctly for this class of programs. 

Magic Sets for Datalog'^. The first extension of the Magic Set technique to 
disjunctive Datalog is due to [32,33], where the SMS method has been presented 
and proved to be correct for Datalog"^ programs. We point out that the main 
drawback of this method is the introduction of collecting predicates. Indeed, 
magic and collecting predicates of SMS have deterministic definitions. As a 
consequence, their extension can be completely computed during program 
instantiation, which means that no further optimization is provided for the 
subsequent stable model search. Moreover, while the correctness of DMS has 
been formally established for Datalog^'"" programs in general, the applicability 
of SMS to Datalog"^'"'" programs has only been outlined in [32,33]. 

Applications. Magic Sets have been applied in many contexts. In particu- 
lar, [13,36,51,53] have profitably exploited the optimization provided by DMS. 
In particular, in [13,51] a data integration system has been presented. The sys- 
tem is based on disjunctive Datalog and exploits DMS for fast query answering. 
In [36,53], instead, an algorithm for answering queries over description logic 
knowledge bases has been presented. More specifically, the algorithm reduces 
a Sl-iXQ knowledge base to a disjunctive Datalog program, so that DMS can 
be exploited for query answering optimization. 
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8 Conclusion 



The Magic Set method is one of the best-known techniques for the optimiza- 
tion of positive recursive Datalog programs due to its efficiency and its gen- 
eraUty. Just a few other focused methods such as the supplementary Magic 
Set and other special techniques for linear and chain queries have gained sim- 
ilar visibility (see, e.g., [34,56,63]). After seminal papers [6,9], the viability 
of the approach was demonstrated e.g., in [35,55]. Later on, extensions and 
refinements were proposed, addressing e.g., query constraints in [62], the well- 
founded semantics in [38], or integration into cost-based query optimization in 
[60]. The research on variations of the Magic Set method is still going on. For 
instance, in [24] an extension of the Magic Set method was discussed for the 
class of unstratified logic programs (without disjunction). In [10] a technique 
for the class of soft-stratifiahle programs was given. Finally, in [33] the first 
variant of the technique for disjunctive programs (SMS) was described. 

In this paper, we have elaborated on the issues addressed in [32,33]. Our 
approach is similar to SMS, but differs in several respects: 

• DMS is a dynamic optimization of query answering, in the sense that in 
addition to the optimization of the grounding process (which is the only 
optimization performed by SMS), DMS can drive the model generation phase 
by dynamically disabling parts of the program that become irrelevant in the 
considered partial interpretations. 

• DMS has a strong relationship with unfounded sets, allowing for a clean 
application to disjunctive Datalog programs also in presence of stratified 
negation. 

• DMS can be further improved by performing a subsequent subsumption 
check. 

• DMS is integrated into the DLV system [43], profitably exploiting the DLV 
internal data-structures and the ability of controlling the grounding module. 

We have conducted experiments on several benchmarks, many of which taken 
from the literature. The results of our experimentation evidence that our im- 
plementation outperforms SMS in general, often by an exponential factor. This 
is mainly due to the optimization of the model generation phase, which is 
specific to our Magic Set technique. In addition, we have conducted further 
experiments on a real application scenario, which show that Magic Sets can 
play a crucial role in optimizing consistent query answering over inconsistent 
databases. Importantly, other authors have already recognized the benefits of 
our optimization strategies with respect to this very important application 
domain [51], thereby confirming the validity and the robustness of the work 
discussed in this paper. 

We conclude by observing that it has been noted in the literature (e.g., in [38]) 
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that in the non-disjunctive case memoing techniques lead to similar compu- 
tations as evaluations after Magic Set transformations. Also in the disjunc- 
tive case such techniques have been proposed (e.g., Hyper Tableaux [8]), for 
which similar relations might hold. While [38] has already evidenced that an 
advantage of Magic Sets over such methods is that they may be more eas- 
ily combined with other optimization techniques, we believe that achieving a 
deeper comprehension of the relationships among these techniques constitutes 
an interesting avenue for further research. 

Another issue that we leave for future work is to study the impact of changing 
some parameters of the DMS method, in particular the impact of different 
SIPSes. 
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A Queries on the INFOMIX Demo ScenEirio 



INFOMIX is a project that was funded by the European Commission in its 
Information Society Technologies track of the Sixth Framework Programme 
for providing an advanced system for information integration. A detailed de- 
scription of the project, including references in the literature, can be found 
at https://www.mat.unical.it/infomix/. Five typical queries of the IN- 
FOMIX demo scenario have been considered for assessing Dynamic Magic 
Sets. The full encodings of the tested queries are reported in Figures A.1-A.2. 
Note that the encodings include the transformation described in Section 6, 
and that underlined predicates denote source relations. 
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courses (XijXs) :— esame (_,Xi,X2,-). 

course-p (Xi ,X2) esame_diploma(Xi ,X2). 

exam_recordxi (Xi,X2,Z,W,X4,X5,Y) ;— af f idamenti_ing_inf ormatica(X2,X3,Y), 

dati_es^i(Xi,_,X2,XB,X4,_,Y), dati_prof essori(X3,Z,W). 
exam_recordout(Xi,X2,X3,X4,Y6,Y6,Y7) V exam_recordout(Xi,X2,X3,X4, 25,26,27) :— 

exam_recordi5(Xi,X2,X3,X4,Y5,Y6,Y7), exam_recordx, (Xi ,X2,X3,X4, 25,25,27), Y57^26. 
exam_recordout (Xi,X2,X3,X4,Y5,Y6,Y7) V exam_recordout(Xi,X2,X3,X4,25,26,27) : — 

exam_recordx>(Xi,X2,X3,X4,Y6,Y6,Y7), exam_recordEi(Xi,X2,X3,X4,2B,26,27), Ye^Ze. 
exam_recordout (Xi,X2,X3,X4,Y5,Y6,Y7) V exam_recordout(Xi,X2,X3,X4,25,26,27) : — 

exam_recordi>(Xi,X2,X3,X4,Y6,Y6,Y7), exam_recordxi(Xi,X2,X3,X4,26,26,27), Yjj^Zt. 
course(Xi,X2) ;— coursex>(Xi,X2), not courseout(Xi,X2). 
exam_record(Xi ,X2 ,X3 ,X4 ,X6 ,X6 ,X7) : — exanurecordu (Xi ,X2 ,X3 ,X4 ,X6 ,X6 ,X7) , 

not exam_recordont (Xi ,X2,X3,X4,X5,X6,X7). 
queryi(CD) :— course(C,CD), exam_record( "09089903" ,C, _,_,_,_,_). 
query 1 (CD)? 

studentD (Xi .X2,X3,X4,X5,X6,X7) diploma_maturita(Y,X7), 

studeiite(Xi,X3,X2,_,_,_,_,_,_,_,_,_,X6,X5,_,_,X4,_,_,_,_,Y,_). 

student (Xi ,X2 ,X3 ,X4 ,X5 ,X6 ,X7) : — student x? (Xi ,X2 ,X3 ,X4 5X5 ,X6 ,X7) , 
not studentout (Xi ,X2,X3 ,X4,X5,X6,X7). 

query2(SFN,SLN,C0R,ADD,TEL,HSS) :- student( "09089903" ,SFN,SLN,CDR,ADD,TEL,HSS). 

query2 {SFN,SLN,COR,ADD,TEL,HSS)? 

studentD(Xi,X2,X3,X4,X6,X6,X7) :— diploma_maturita(Y,X7), 

studente(Xi,X3,X2,_,_,_,_,_,_,_,_,_,X6,XB,_,_,X4,_,_,_,_,Y,_). 
student_course_planx)(Xi,X2,X3,X4,X6) :— orientamento (Yi .Xs). 

piano_studl(Xi,X2,Yi,X4,Y2, _,_,_,_,_), stato(Y2,XB). 
student (Xi ,X2,X3,X4,X5,X6,X7) :— student x) (Xi,X2,X3,X4,X5,X6,X7), 

not studentout (Xi ,X2 ,X3 ,X4 5X5 ,X6 ,X7) . 
student _course_plan(Xi ,X2 ,X3 ,X4 ,X5) :— student _course_planx) (Xi ,X2 ,X3 ,X4 ,X6) , 

not student_course_planout (Xi ,X2,X3,X4,X5). 
query3(SID,SLN,R) ;- student(SID, "2NEPB" ,SLN, _,_,_,_), 

student_course_plan(_,SID,_,R,"APPROVATO SEN2A MODIFICHE"). 
query3(SID,SLN,R)? 

Fig. A.l. INFOMIX Queries 1-3 
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studeiitD(Xi,X2,X3,X4,X5,X6,X7) :— diploma_maturita(Y,X7), 

student e (Xi ,X3 ,X2 , _, _, _, _, _, _, _, _, _,X6 ,X6 , _, _,X4 , _,_,Y, _) . 
course73(Xi,X2) :— esame (_.Xi .X?._). 
courseD(Xi,X2) :— esame_diploma(Xi ,X2). 

student_course_plaiii>(Xi,X2,X3,X4,X5) :— orientamento (Yi ,13), 

plano_studi (Xi ,X2 ,Y1 ,X4 ,Y2 ,-,-,-,-,-) , stato (Y2 ,Xb ) . 
plan_dataxi(Xi,X2,X3) :— dati_piano_studl(Xi,X2,_), 

esaitie_ingegneria(X2,Y3,Y2,_), tipo_esame(Y2,X3). 
student(Xi 5X25X3, X4,X55X65X7) :— studentx) (Xi,X25X3,X4,X55X65X7), 

not studentout (Xi ,X2 5X3 5X4 5X5 ,X6 5X7) . 
student _course_plaii(Xi ,X2 5X3 5X4 5X5) : — student _course_planx) (Xi ,X2 5X3 5X4 5X5) 

not student_course_planout (Xi ,X2,X3,X4,X5). 
plan_data(Xi5X2,X3) :— plaii_datax)(Xi 5X25X3), not plan_dataout(Xi5X2 5X3). 
course(Xi,X2) :— courseD(Xi5X2)5 not courseout(Xi5X2). 
query4(F,S) :- course (CID, "RETILOGICHE"), plan_data(SCID,CID,_), 

student(SID,F5S,"ROMA",_5_,_), student_course_plan(SCID,SID,_5_,_). 
query4(F,S)? 

coursexi(Xi5X2) ;— esame (_.Xi .X2.-). 
coursej5(Xi,X2) :— esame_diploma(Xi,X2). 

student_course_plani>(Xi,X2,X3,X4,X5) :— orientamento (Yi,X3), 

piano_studi(Xi,X2,Yi,X4,Y2, _,_,_,_,_), stato(Y2,X6). 
plaii_dataxi(Xi5X2 5X3) :— dati_piano_studi(Xi 5X2,-), 

esame_ingegner ia (X2 5Y3 5Y2 , _) , t ipo_esame (Y2 ,X3 ) . 
student _course_plan(Xi ,X2,X3,X4,X6) :— student _course_planx)(Xi,X2,X35X4 5X5) 5 

not student_course_planout (Xi 5X25X35X45X5). 
plan_data(Xi 5X25X3) :— plaii_datax)(Xi5X2 5X3), not plan_dataout(Xi5X2 5X3). 
course(Xi5X2) :— coursex)(Xi5X2)5 not courseout(Xi5X2). 

query5(D) :- course(E5D)5 plan_data(C5E,_), student_course_plan(C, "09089903", _,_,_). 
query 5 (D)? 

Fig. A.2. INFOMIX Queries 4-5 
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